<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Alexander Rubin&#039;s Blog on MySQL &#187; Uncategorized</title>
	<atom:link href="http://www.arubin.org/blog/category/uncategorized/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.arubin.org/blog</link>
	<description>MySQL, FullText Search, Performance, High Availability</description>
	<lastBuildDate>Sat, 30 Jul 2011 00:18:54 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Fixing data warehousing queries with group-by</title>
		<link>http://www.arubin.org/blog/2010/11/29/fixing-data-warehousing-queries-with-group-by/</link>
		<comments>http://www.arubin.org/blog/2010/11/29/fixing-data-warehousing-queries-with-group-by/#comments</comments>
		<pubDate>Mon, 29 Nov 2010 20:50:59 +0000</pubDate>
		<dc:creator>arubin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.arubin.org/blog/?p=78</guid>
		<description><![CDATA[<p>With the standard data warehousing queries we have a fact table and dimension tables and we join them.
For example, the fact table (Table size: 5M rows, ~2G in size) from my previous Loose index scan vs. covered indexes in MySQL post:</p>

    CREATE TABLE `ontime_2010` (
      `YearD` int(11) [...]]]></description>
			<content:encoded><![CDATA[<p>With the standard data warehousing queries we have a fact table and dimension tables and we join them.<br />
For example, the fact table (Table size: 5M rows, ~2G in size) from my previous <a href="http://www.arubin.org/blog/2010/11/18/loose-index-scan-vs-covered-indexes-in-mysql/">Loose index scan vs. covered indexes in MySQL</a> post:</p>
<blockquote><pre>
    CREATE TABLE `ontime_2010` (
      `YearD` int(11) DEFAULT NULL,
      `MonthD` tinyint(4) DEFAULT NULL,
      `DayofMonth` tinyint(4) DEFAULT NULL,
      `DayOfWeek` tinyint(4) DEFAULT NULL,
      `Carrier` char(2) DEFAULT NULL,
      `Origin` char(5) DEFAULT NULL,
      `DepDelayMinutes` int(11) DEFAULT NULL,
      `AirlineID` int(11) DEFAULT NULL,
      `Cancelled` tinyint(4) DEFAULT NULL,
    ... more fields here ...
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1
</pre>
</blockquote>
<p>(this is not the best possible fact table as the data is not aggregated by I&#8217;ll use it for now).</p>
<p>And we have those dimensions tables:</p>
<blockquote>
<pre>
 CREATE TABLE `airlines` (
  `AirlineID` int(11) NOT NULL DEFAULT '0',
  `AirlineName` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`AirlineID`),
  KEY `AirlineName` (`AirlineName`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

CREATE TABLE `date_dayofweek` (
  `code` int(11) NOT NULL DEFAULT '0',
  `description` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`code`),
  KEY `description` (`description`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

mysql> select * from date_dayofweek order by code;
+------+-------------+
| code | description |
+------+-------------+
|    1 | Monday      |
|    2 | Tuesday     |
|    3 | Wednesday   |
|    4 | Thursday    |
|    5 | Friday      |
|    6 | Saturday    |
|    7 | Sunday      |
|    9 | Unknown     |
+------+-------------+
8 rows in set (0.00 sec)
</pre>
</blockquote>
<p>So here is the example query (find sum of cancelled flights on Sundays for the given airline group by day):</p>
<blockquote>
<pre>select sum(Cancelled), FlightDate, AirlineName
from ontime_2010 o, date_dayofweek dow, airlines a
where o.dayofweek=dow.code and dow.description = 'Sunday'
and a.AirlineID = o.AirlineID and a.AirlineName = 'Delta Air Lines Inc.: DL'
group by FlightDate order by FlightDate desc limit 10\G
</pre>
</blockquote>
<p>To fix the query we can add a covered index for ontime_2010, so that all fields for ontime_2010 table will be covered:</p>
<blockquote><p>alter table ontime_2010 add key cov2(AirlineID, dayofweek, FlightDate, Cancelled); </p></blockquote>
<p>However we will still have &#8220;temporary table and filesort&#8221;:</p>
<blockquote>
<pre>

mysql> explain select sum(Cancelled), FlightDate
from ontime_2010 o, date_dayofweek dow, airlines a
where o.dayofweek=dow.code and dow.description = 'Sunday'
and a.AirlineID = o.AirlineID and a.AirlineName = 'Delta Air Lines Inc.: DL'
group by FlightDate order by FlightDate desc limit 10\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: dow
         type: ref
possible_keys: PRIMARY,description
          key: description
      key_len: 258
          ref: const
         rows: 1
        Extra: Using where; Using index; Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: a
         type: ref
possible_keys: PRIMARY,AirlineName
          key: AirlineName
      key_len: 258
          ref: const
         rows: 1
        Extra: Using where; Using index
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: o
         type: ref
possible_keys: DayOfWeek,covered,AirlineID,cov2
          key: cov2
      key_len: 7
          ref: ontime.a.AirlineID,ontime.dow.code
         rows: 24417
        Extra: Using where; Using index
3 rows in set (0.00 sec)
<pre></blockquote>

To avoid filesort we can re-write this query with "subqueries":
<blockquote>
<pre>
mysql> explain select sum(Cancelled), FlightDate  from ontime_2010 o
where o.dayofweek= (select code from date_dayofweek where description = 'Sunday')
and AirlineID = (select AirlineID from airlines where AirlineName = 'Delta Air Lines Inc.: DL')
group by FlightDate limit 10\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: o
         type: ref
possible_keys: DayOfWeek,covered,AirlineID,cov2
          key: cov2
      key_len: 7
          ref: const,const
         rows: 152510
        Extra: Using where; Using index
*************************** 2. row ***************************
           id: 3
  select_type: SUBQUERY
        table: airlines
         type: ref
possible_keys: AirlineName
          key: AirlineName
      key_len: 258
          ref:
         rows: 1
        Extra: Using where; Using index
*************************** 3. row ***************************
           id: 2
  select_type: SUBQUERY
        table: date_dayofweek
         type: ref
possible_keys: description
          key: description
      key_len: 258
          ref:
         rows: 1
        Extra: Using where; Using index
3 rows in set (0.00 sec)
</pre>
</blockquote>
<p>As MySQL will use indexes when we have "field = (select .. )" and now all fields in the index belong to the single table, MySQL will use index and avoid filesort. Please note: this will not work with "field in (select ...)" and also make sure that the subselect part will return only 1 row.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.arubin.org/blog/2010/11/29/fixing-data-warehousing-queries-with-group-by/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Converting queries with OR to Union to ulitize indexes</title>
		<link>http://www.arubin.org/blog/2010/11/22/converting-queries-with-or-to-union-to-ulitize-indexes/</link>
		<comments>http://www.arubin.org/blog/2010/11/22/converting-queries-with-or-to-union-to-ulitize-indexes/#comments</comments>
		<pubDate>Mon, 22 Nov 2010 21:25:04 +0000</pubDate>
		<dc:creator>arubin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.arubin.org/blog/?p=69</guid>
		<description><![CDATA[<p>Lets say we have a table storing mail messages and we need to show user&#8217;s mailbox: messages sent &#8220;from&#8221; and &#8220;to&#8221; the specified user.</p>
<p>Here is our table:</p>


CREATE TABLE `internalmail` (
  `mail_id` int(10) NOT NULL AUTO_INCREMENT,
  `senderaddress_id` int(10) NOT NULL,
  `recipientaddress_id` int(10) NOT NULL,
  `mail_timestamp` timestamp NULL DEFAULT NULL,
... message body, etc [...]]]></description>
			<content:encoded><![CDATA[<p>Lets say we have a table storing mail messages and we need to show user&#8217;s mailbox: messages sent &#8220;from&#8221; and &#8220;to&#8221; the specified user.</p>
<p>Here is our table:</p>
<blockquote>
<pre>
CREATE TABLE `internalmail` (
  `mail_id` int(10) NOT NULL AUTO_INCREMENT,
  `senderaddress_id` int(10) NOT NULL,
  `recipientaddress_id` int(10) NOT NULL,
  `mail_timestamp` timestamp NULL DEFAULT NULL,
... message body, etc ...
  PRIMARY KEY (`mail_id`),
) ENGINE=InnoDB DEFAULT CHARSET=latin1
</pre>
</blockquote>
<p>And our query:</p>
<blockquote><pre>select * from internalmail
 where (senderaddress_id = 247 or recipientaddress_id = 247 or recipientaddress_id = 0)
and mail_timestamp > '2010-08-01 12:30:47'
order by mail_timestamp desc </pre>
</blockquote>
<p>In this query we show all messages from and to user_id = 247 plus all messages to system user (user_id=0). We need to show only messages for the last 3 months and show the most recent messages first.</p>
<p>To speed up the query we can try creating indexes:</p>
<blockquote><p>
  KEY `recipientaddress_id` (`recipientaddress_id`),<br />
  KEY `senderaddress_id` (`senderaddress_id`),<br />
  KEY `mail_timestamp` (`mail_timestamp`),
</p></blockquote>
<p>However, as the query uses &#8220;OR&#8221;, MySQL will use a filesort. </p>
<blockquote>
<pre>
mysql> explain select * from internalmail
where (senderaddress_id = 247 or recipientaddress_id = 247 or recipientaddress_id = 0)
and mail_timestamp > '2010-08-01 12:30:47'
order by mail_timestamp desc\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: internalmail
         type: ALL
possible_keys: recipientaddress_id,senderaddress_id,mail_timestamp
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 4843257
        Extra: Using where; Using filesort
1 row in set (0.00 sec)
</pre>
</blockquote>
<p><strong>UPDATE: </strong> even if we will create combined indexes on (recipientaddress_id,mail_timestamp) and/or (senderaddress_id,mail_timestamp) those indexes will not be used, as the query contains &#8220;OR&#8221; in the where clause.</p>
<p>And original query runs for 3 seconds. To fix this query we can do 2 things:</p>
<ol>
<li>Rewrite query with UNION instead of OR</li>
<li>Create combined indexes</li>
</ol>
<p>First, we rewrite query with UNION:</p>
<blockquote><p>
(select * from internalmail where senderaddress_id = 247  and mail_timestamp > &#8216;2010-08-19 12:30:47&#8242;)<br />
union<br />
(select * from internalmail where  recipientaddress_id = 247  and mail_timestamp > &#8216;2010-08-19 12:30:47&#8242;)<br />
union<br />
(select * from internalmail where  recipientaddress_id = 0  and mail_timestamp > &#8216;2010-08-19 12:30:47&#8242;)<br />
order by mail_timestamp desc;
</p></blockquote>
<p>Second, we create 2 indexes:</p>
<blockquote><p>
mysql> alter table internalmail add key send_dt(senderaddress_id, mail_timestamp);<br />
mysql> alter table internalmail add key recieve_dt(recipientaddress_id, mail_timestamp);
</p></blockquote>
<p>After that, MySQL will be able to fully utilize index for each of the 3 queries in union:</p>
<blockquote>
<pre>
mysql> explain
(select * from internalmail where senderaddress_id = 247  and mail_timestamp > '2010-08-19 12:30:47')
union
(select * from internalmail where  recipientaddress_id = 247  and mail_timestamp > '2010-08-19 12:30:47')
union
(select * from internalmail where  recipientaddress_id = 0  and mail_timestamp > '2010-08-19 12:30:47')
order by mail_timestamp desc\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: internalmail
         type: range
possible_keys: senderaddress_id,mail_timestamp,send_dt
          key: send_dt
      key_len: 9
          ref: NULL
         rows: 5
        Extra: Using where
*************************** 2. row ***************************
           id: 2
  select_type: UNION
        table: internalmail
         type: range
possible_keys: recipientaddress_id,mail_timestamp,recieve_dt
          key: recieve_dt
      key_len: 9
          ref: NULL
         rows: 11
        Extra: Using where
*************************** 3. row ***************************
           id: 3
  select_type: UNION
        table: internalmail
         type: range
possible_keys: recipientaddress_id,mail_timestamp,recieve_dt
          key: recieve_dt
      key_len: 9
          ref: NULL
         rows: 1
        Extra: Using where
*************************** 4. row ***************************
           id: NULL
  select_type: UNION RESULT
        table: <union1,2,3>
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
        Extra: Using filesort
4 rows in set (0.00 sec)
</pre>
</blockquote>
<p>Although this query has to perform a final filesort it is much faster: now it runs in 0 sec compared to 3 seconds originally.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.arubin.org/blog/2010/11/22/converting-queries-with-or-to-union-to-ulitize-indexes/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Speaking at MySQL Users Conference and Expo 2010</title>
		<link>http://www.arubin.org/blog/2010/04/10/speaking-at-mysql-users-conference-and-expo-2010/</link>
		<comments>http://www.arubin.org/blog/2010/04/10/speaking-at-mysql-users-conference-and-expo-2010/#comments</comments>
		<pubDate>Sat, 10 Apr 2010 20:58:36 +0000</pubDate>
		<dc:creator>arubin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.arubin.org/blog/?p=46</guid>
		<description><![CDATA[<p>I&#8217;ll be speaking on MySQL Users Conference 2010. Talk: MySQL Architecture Design Patterns for Performance, Scalability, and Availability, 11:55am  Thursday, 04/15/2010.  Details.</p>
]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ll be speaking on MySQL Users Conference 2010. Talk: MySQL Architecture Design Patterns for Performance, Scalability, and Availability, 11:55am  Thursday, 04/15/2010.  <a href=": http://en.oreilly.com/mysql2010/public/schedule/detail/13384">Details.</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.arubin.org/blog/2010/04/10/speaking-at-mysql-users-conference-and-expo-2010/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>New blog on MySQL</title>
		<link>http://www.arubin.org/blog/2009/09/23/new-blog-on-mysql/</link>
		<comments>http://www.arubin.org/blog/2009/09/23/new-blog-on-mysql/#comments</comments>
		<pubDate>Wed, 23 Sep 2009 19:22:42 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.arubin.org/blog/?p=14</guid>
		<description><![CDATA[<p>I&#8217;ve started my new blog on MySQL. I&#8217;ll focus on MySQL full text search, performance tuning and High Availability (HA)</p>
]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve started my new blog on MySQL. I&#8217;ll focus on MySQL full text search, performance tuning and High Availability (HA)</p>
]]></content:encoded>
			<wfw:commentRss>http://www.arubin.org/blog/2009/09/23/new-blog-on-mysql/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

