<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Alexander Rubin&#039;s Blog on MySQL</title>
	<atom:link href="http://www.arubin.org/blog/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.arubin.org/blog</link>
	<description>MySQL, FullText Search, Performance, High Availability</description>
	<lastBuildDate>Sat, 30 Jul 2011 00:18:54 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Using MySQL 5.6 to find queries creating disk temporary tables</title>
		<link>http://www.arubin.org/blog/2011/05/01/using-mysql-5-6-to-find-queries-creating-disk-temporary-tables/</link>
		<comments>http://www.arubin.org/blog/2011/05/01/using-mysql-5-6-to-find-queries-creating-disk-temporary-tables/#comments</comments>
		<pubDate>Sun, 01 May 2011 23:58:10 +0000</pubDate>
		<dc:creator>arubin</dc:creator>
				<category><![CDATA[performance tuning]]></category>
		<category><![CDATA[temporary tables]]></category>
		<category><![CDATA[disk temporary tables]]></category>
		<category><![CDATA[mysql 5.6]]></category>
		<category><![CDATA[performance_schema]]></category>
		<category><![CDATA[slow query]]></category>

		<guid isPermaLink="false">http://www.arubin.org/blog/?p=98</guid>
		<description><![CDATA[<p>In my previous post, I&#8217;ve showed how to use Dtrace to find queries creating disk temporary tables (only available for OS with dtrace: solaris, freebsd, etc). </p>
<p>In MySQL 5.6 (which is not released yet, use &#8220;labs&#8221; version for now) we can use new performance_schema table events_statements_history or events_statements_history_long to find all performance metrics for all [...]]]></description>
			<content:encoded><![CDATA[<p>In my previous post, I&#8217;ve showed how to use Dtrace to <a href="http://www.arubin.org/blog/2009/10/02/using-dtrace-to-find-queries-creating-disk-temporary-tables/">find queries creating disk temporary tables</a> (only available for OS with dtrace: solaris, freebsd, etc). </p>
<p>In MySQL 5.6 (which is not released yet, use &#8220;labs&#8221; version for now) we can use new performance_schema table events_statements_history or events_statements_history_long to find all performance metrics for all queries including created disk/memory tables, use of index, etc. WOW! This is what I have been waiting for a long time!</p>
<p>To illustrate, I have grabbed mysql-5.6.3-labs-performance-schema-linux2.6-x86_64.tar.gz from <a href="http://labs.mysql.com/">labs.mysql.com</a> (this feature is only in labs version) and run sysbench readonly test (you need to disable prepared statements in sysbench, seems to be not working with prepared statements, I will check it later).</p>
<p>Here are the results:</p>
<blockquote>
<pre>mysql> select * from events_statements_history_long where  CREATED_TMP_DISK_TABLES > 0 limit 10\G
*************************** 10. row ***************************
              THREAD_ID: 74
               EVENT_ID: 3295633
             EVENT_NAME: statement/sql/select
                 SOURCE: sql_parse.cc:935
            TIMER_START: 633828149000000
              TIMER_END: 633843868000000
             TIMER_WAIT: 15719000000
              LOCK_TIME: 53000000
               SQL_TEXT: SELECT DISTINCT c from sbtest where id between 847399 and 847499 order by c
         CURRENT_SCHEMA: sbtest
            OBJECT_TYPE: NULL
          OBJECT_SCHEMA: NULL
            OBJECT_NAME: NULL
  OBJECT_INSTANCE_BEGIN: NULL
            MYSQL_ERRNO: 0
      RETURNED_SQLSTATE: NULL
           MESSAGE_TEXT: NULL
                 ERRORS: 0
               WARNINGS: 0
          ROWS_AFFECTED: 0
              ROWS_SENT: 1
          ROWS_EXAMINED: 103
CREATED_TMP_DISK_TABLES: 1
     CREATED_TMP_TABLES: 1
       SELECT_FULL_JOIN: 0
 SELECT_FULL_RANGE_JOIN: 0
           SELECT_RANGE: 1
     SELECT_RANGE_CHECK: 0
            SELECT_SCAN: 0
      SORT_MERGE_PASSES: 0
             SORT_RANGE: 0
              SORT_ROWS: 1
              SORT_SCAN: 1
          NO_INDEX_USED: 0
     NO_GOOD_INDEX_USED: 0
       NESTING_EVENT_ID: NULL
     NESTING_EVENT_TYPE: NULL
10 rows in set (0.00 sec)
</pre>
</blockquote>
<p>Or if you need only list of queries:</p>
<blockquote><pre>
mysql> select sql_text, count(*) as cnt  from events_statements_history_long
where  CREATED_TMP_DISK_TABLES > 0
group by sql_text order by cnt desc  limit 10;
+-----------------------------------------------------------------------------+-----+
| sql_text                                                                    | cnt |
+-----------------------------------------------------------------------------+-----+
| SELECT DISTINCT c from sbtest where id between 242012 and 242112 order by c |   2 |
| SELECT DISTINCT c from sbtest where id between 797388 and 797488 order by c |   2 |
| SELECT DISTINCT c from sbtest where id between 973150 and 973250 order by c |   1 |
| SELECT DISTINCT c from sbtest where id between 478783 and 478883 order by c |   1 |
| SELECT DISTINCT c from sbtest where id between 967035 and 967135 order by c |   1 |
| SELECT DISTINCT c from sbtest where id between 602102 and 602202 order by c |   1 |
| SELECT DISTINCT c from sbtest where id between 123827 and 123927 order by c |   1 |
| SELECT DISTINCT c from sbtest where id between 980527 and 980627 order by c |   1 |
| SELECT DISTINCT c from sbtest where id between 450354 and 450454 order by c |   1 |
| SELECT DISTINCT c from sbtest where id between 674804 and 674904 order by c |   1 |
+-----------------------------------------------------------------------------+-----+
10 rows in set (0.04 sec)
</pre>
</blockquote>
<p>We can filter and order by rows_examined,  SORT_MERGE_PASSES,  NO_INDEX_USED,  NO_GOOD_INDEX_USED, etc.</p>
<p>Links:</p>
<ul>
<li><a href="http://dev.mysql.com/tech-resources/articles/whats-new-in-mysql-5.6.html">What is new in MySQL 5.6: all long waited great features of MySQL 5.6</a> (btw: Multi-Threaded Slaves are coming up, now in labs only)</li>
<li><a href="http://www.markleith.co.uk/?p=471">A Big Bag of Epic Awesomeness</a>, by Mark Leith</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.arubin.org/blog/2011/05/01/using-mysql-5-6-to-find-queries-creating-disk-temporary-tables/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Fixing data warehousing queries with group-by</title>
		<link>http://www.arubin.org/blog/2010/11/29/fixing-data-warehousing-queries-with-group-by/</link>
		<comments>http://www.arubin.org/blog/2010/11/29/fixing-data-warehousing-queries-with-group-by/#comments</comments>
		<pubDate>Mon, 29 Nov 2010 20:50:59 +0000</pubDate>
		<dc:creator>arubin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.arubin.org/blog/?p=78</guid>
		<description><![CDATA[<p>With the standard data warehousing queries we have a fact table and dimension tables and we join them.
For example, the fact table (Table size: 5M rows, ~2G in size) from my previous Loose index scan vs. covered indexes in MySQL post:</p>

    CREATE TABLE `ontime_2010` (
      `YearD` int(11) [...]]]></description>
			<content:encoded><![CDATA[<p>With the standard data warehousing queries we have a fact table and dimension tables and we join them.<br />
For example, the fact table (Table size: 5M rows, ~2G in size) from my previous <a href="http://www.arubin.org/blog/2010/11/18/loose-index-scan-vs-covered-indexes-in-mysql/">Loose index scan vs. covered indexes in MySQL</a> post:</p>
<blockquote><pre>
    CREATE TABLE `ontime_2010` (
      `YearD` int(11) DEFAULT NULL,
      `MonthD` tinyint(4) DEFAULT NULL,
      `DayofMonth` tinyint(4) DEFAULT NULL,
      `DayOfWeek` tinyint(4) DEFAULT NULL,
      `Carrier` char(2) DEFAULT NULL,
      `Origin` char(5) DEFAULT NULL,
      `DepDelayMinutes` int(11) DEFAULT NULL,
      `AirlineID` int(11) DEFAULT NULL,
      `Cancelled` tinyint(4) DEFAULT NULL,
    ... more fields here ...
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1
</pre>
</blockquote>
<p>(this is not the best possible fact table as the data is not aggregated by I&#8217;ll use it for now).</p>
<p>And we have those dimensions tables:</p>
<blockquote>
<pre>
 CREATE TABLE `airlines` (
  `AirlineID` int(11) NOT NULL DEFAULT '0',
  `AirlineName` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`AirlineID`),
  KEY `AirlineName` (`AirlineName`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

CREATE TABLE `date_dayofweek` (
  `code` int(11) NOT NULL DEFAULT '0',
  `description` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`code`),
  KEY `description` (`description`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

mysql> select * from date_dayofweek order by code;
+------+-------------+
| code | description |
+------+-------------+
|    1 | Monday      |
|    2 | Tuesday     |
|    3 | Wednesday   |
|    4 | Thursday    |
|    5 | Friday      |
|    6 | Saturday    |
|    7 | Sunday      |
|    9 | Unknown     |
+------+-------------+
8 rows in set (0.00 sec)
</pre>
</blockquote>
<p>So here is the example query (find sum of cancelled flights on Sundays for the given airline group by day):</p>
<blockquote>
<pre>select sum(Cancelled), FlightDate, AirlineName
from ontime_2010 o, date_dayofweek dow, airlines a
where o.dayofweek=dow.code and dow.description = 'Sunday'
and a.AirlineID = o.AirlineID and a.AirlineName = 'Delta Air Lines Inc.: DL'
group by FlightDate order by FlightDate desc limit 10\G
</pre>
</blockquote>
<p>To fix the query we can add a covered index for ontime_2010, so that all fields for ontime_2010 table will be covered:</p>
<blockquote><p>alter table ontime_2010 add key cov2(AirlineID, dayofweek, FlightDate, Cancelled); </p></blockquote>
<p>However we will still have &#8220;temporary table and filesort&#8221;:</p>
<blockquote>
<pre>

mysql> explain select sum(Cancelled), FlightDate
from ontime_2010 o, date_dayofweek dow, airlines a
where o.dayofweek=dow.code and dow.description = 'Sunday'
and a.AirlineID = o.AirlineID and a.AirlineName = 'Delta Air Lines Inc.: DL'
group by FlightDate order by FlightDate desc limit 10\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: dow
         type: ref
possible_keys: PRIMARY,description
          key: description
      key_len: 258
          ref: const
         rows: 1
        Extra: Using where; Using index; Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: a
         type: ref
possible_keys: PRIMARY,AirlineName
          key: AirlineName
      key_len: 258
          ref: const
         rows: 1
        Extra: Using where; Using index
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: o
         type: ref
possible_keys: DayOfWeek,covered,AirlineID,cov2
          key: cov2
      key_len: 7
          ref: ontime.a.AirlineID,ontime.dow.code
         rows: 24417
        Extra: Using where; Using index
3 rows in set (0.00 sec)
<pre></blockquote>

To avoid filesort we can re-write this query with "subqueries":
<blockquote>
<pre>
mysql> explain select sum(Cancelled), FlightDate  from ontime_2010 o
where o.dayofweek= (select code from date_dayofweek where description = 'Sunday')
and AirlineID = (select AirlineID from airlines where AirlineName = 'Delta Air Lines Inc.: DL')
group by FlightDate limit 10\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: o
         type: ref
possible_keys: DayOfWeek,covered,AirlineID,cov2
          key: cov2
      key_len: 7
          ref: const,const
         rows: 152510
        Extra: Using where; Using index
*************************** 2. row ***************************
           id: 3
  select_type: SUBQUERY
        table: airlines
         type: ref
possible_keys: AirlineName
          key: AirlineName
      key_len: 258
          ref:
         rows: 1
        Extra: Using where; Using index
*************************** 3. row ***************************
           id: 2
  select_type: SUBQUERY
        table: date_dayofweek
         type: ref
possible_keys: description
          key: description
      key_len: 258
          ref:
         rows: 1
        Extra: Using where; Using index
3 rows in set (0.00 sec)
</pre>
</blockquote>
<p>As MySQL will use indexes when we have "field = (select .. )" and now all fields in the index belong to the single table, MySQL will use index and avoid filesort. Please note: this will not work with "field in (select ...)" and also make sure that the subselect part will return only 1 row.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.arubin.org/blog/2010/11/29/fixing-data-warehousing-queries-with-group-by/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Converting queries with OR to Union to ulitize indexes</title>
		<link>http://www.arubin.org/blog/2010/11/22/converting-queries-with-or-to-union-to-ulitize-indexes/</link>
		<comments>http://www.arubin.org/blog/2010/11/22/converting-queries-with-or-to-union-to-ulitize-indexes/#comments</comments>
		<pubDate>Mon, 22 Nov 2010 21:25:04 +0000</pubDate>
		<dc:creator>arubin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.arubin.org/blog/?p=69</guid>
		<description><![CDATA[<p>Lets say we have a table storing mail messages and we need to show user&#8217;s mailbox: messages sent &#8220;from&#8221; and &#8220;to&#8221; the specified user.</p>
<p>Here is our table:</p>


CREATE TABLE `internalmail` (
  `mail_id` int(10) NOT NULL AUTO_INCREMENT,
  `senderaddress_id` int(10) NOT NULL,
  `recipientaddress_id` int(10) NOT NULL,
  `mail_timestamp` timestamp NULL DEFAULT NULL,
... message body, etc [...]]]></description>
			<content:encoded><![CDATA[<p>Lets say we have a table storing mail messages and we need to show user&#8217;s mailbox: messages sent &#8220;from&#8221; and &#8220;to&#8221; the specified user.</p>
<p>Here is our table:</p>
<blockquote>
<pre>
CREATE TABLE `internalmail` (
  `mail_id` int(10) NOT NULL AUTO_INCREMENT,
  `senderaddress_id` int(10) NOT NULL,
  `recipientaddress_id` int(10) NOT NULL,
  `mail_timestamp` timestamp NULL DEFAULT NULL,
... message body, etc ...
  PRIMARY KEY (`mail_id`),
) ENGINE=InnoDB DEFAULT CHARSET=latin1
</pre>
</blockquote>
<p>And our query:</p>
<blockquote><pre>select * from internalmail
 where (senderaddress_id = 247 or recipientaddress_id = 247 or recipientaddress_id = 0)
and mail_timestamp > '2010-08-01 12:30:47'
order by mail_timestamp desc </pre>
</blockquote>
<p>In this query we show all messages from and to user_id = 247 plus all messages to system user (user_id=0). We need to show only messages for the last 3 months and show the most recent messages first.</p>
<p>To speed up the query we can try creating indexes:</p>
<blockquote><p>
  KEY `recipientaddress_id` (`recipientaddress_id`),<br />
  KEY `senderaddress_id` (`senderaddress_id`),<br />
  KEY `mail_timestamp` (`mail_timestamp`),
</p></blockquote>
<p>However, as the query uses &#8220;OR&#8221;, MySQL will use a filesort. </p>
<blockquote>
<pre>
mysql> explain select * from internalmail
where (senderaddress_id = 247 or recipientaddress_id = 247 or recipientaddress_id = 0)
and mail_timestamp > '2010-08-01 12:30:47'
order by mail_timestamp desc\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: internalmail
         type: ALL
possible_keys: recipientaddress_id,senderaddress_id,mail_timestamp
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 4843257
        Extra: Using where; Using filesort
1 row in set (0.00 sec)
</pre>
</blockquote>
<p><strong>UPDATE: </strong> even if we will create combined indexes on (recipientaddress_id,mail_timestamp) and/or (senderaddress_id,mail_timestamp) those indexes will not be used, as the query contains &#8220;OR&#8221; in the where clause.</p>
<p>And original query runs for 3 seconds. To fix this query we can do 2 things:</p>
<ol>
<li>Rewrite query with UNION instead of OR</li>
<li>Create combined indexes</li>
</ol>
<p>First, we rewrite query with UNION:</p>
<blockquote><p>
(select * from internalmail where senderaddress_id = 247  and mail_timestamp > &#8216;2010-08-19 12:30:47&#8242;)<br />
union<br />
(select * from internalmail where  recipientaddress_id = 247  and mail_timestamp > &#8216;2010-08-19 12:30:47&#8242;)<br />
union<br />
(select * from internalmail where  recipientaddress_id = 0  and mail_timestamp > &#8216;2010-08-19 12:30:47&#8242;)<br />
order by mail_timestamp desc;
</p></blockquote>
<p>Second, we create 2 indexes:</p>
<blockquote><p>
mysql> alter table internalmail add key send_dt(senderaddress_id, mail_timestamp);<br />
mysql> alter table internalmail add key recieve_dt(recipientaddress_id, mail_timestamp);
</p></blockquote>
<p>After that, MySQL will be able to fully utilize index for each of the 3 queries in union:</p>
<blockquote>
<pre>
mysql> explain
(select * from internalmail where senderaddress_id = 247  and mail_timestamp > '2010-08-19 12:30:47')
union
(select * from internalmail where  recipientaddress_id = 247  and mail_timestamp > '2010-08-19 12:30:47')
union
(select * from internalmail where  recipientaddress_id = 0  and mail_timestamp > '2010-08-19 12:30:47')
order by mail_timestamp desc\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: internalmail
         type: range
possible_keys: senderaddress_id,mail_timestamp,send_dt
          key: send_dt
      key_len: 9
          ref: NULL
         rows: 5
        Extra: Using where
*************************** 2. row ***************************
           id: 2
  select_type: UNION
        table: internalmail
         type: range
possible_keys: recipientaddress_id,mail_timestamp,recieve_dt
          key: recieve_dt
      key_len: 9
          ref: NULL
         rows: 11
        Extra: Using where
*************************** 3. row ***************************
           id: 3
  select_type: UNION
        table: internalmail
         type: range
possible_keys: recipientaddress_id,mail_timestamp,recieve_dt
          key: recieve_dt
      key_len: 9
          ref: NULL
         rows: 1
        Extra: Using where
*************************** 4. row ***************************
           id: NULL
  select_type: UNION RESULT
        table: <union1,2,3>
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
        Extra: Using filesort
4 rows in set (0.00 sec)
</pre>
</blockquote>
<p>Although this query has to perform a final filesort it is much faster: now it runs in 0 sec compared to 3 seconds originally.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.arubin.org/blog/2010/11/22/converting-queries-with-or-to-union-to-ulitize-indexes/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Loose index scan vs. covered indexes in MySQL</title>
		<link>http://www.arubin.org/blog/2010/11/18/loose-index-scan-vs-covered-indexes-in-mysql/</link>
		<comments>http://www.arubin.org/blog/2010/11/18/loose-index-scan-vs-covered-indexes-in-mysql/#comments</comments>
		<pubDate>Thu, 18 Nov 2010 17:51:24 +0000</pubDate>
		<dc:creator>arubin</dc:creator>
				<category><![CDATA[performance tuning]]></category>
		<category><![CDATA[covered index]]></category>
		<category><![CDATA[index]]></category>
		<category><![CDATA[loose index scan]]></category>
		<category><![CDATA[mysql performance]]></category>

		<guid isPermaLink="false">http://www.arubin.org/blog/?p=58</guid>
		<description><![CDATA[<p>Loose index scan in MySQL can really help optimizing &#8220;group by&#8221; queries in some cases (for example, if you have only min() and/or max() as your aggregate functions). For example, if you have this query (to find maximum delay for all US flights with departure on Sundays in 2010):</p>

select max(DepDelayMinutes), 	carrier, dayofweek
from ontime_2010
where dayofweek = [...]]]></description>
			<content:encoded><![CDATA[<p>Loose index scan in MySQL can really help optimizing &#8220;group by&#8221; queries in some cases (for example, if you have only min() and/or max() as your aggregate functions). For example, if you have this query (to find maximum delay for all US flights with departure on Sundays in 2010):</p>
<blockquote>
<pre>select max(DepDelayMinutes), 	carrier, dayofweek
from ontime_2010
where dayofweek = 7
group by Carrier,  dayofweek
</pre>
</blockquote>
<p>the usual case will be adding a covered index on (dayofweek, Carrier, DepDelayMinutes). And MySQL will use this index fine (using index mean it will use the covered index):</p>
<blockquote>
<pre>
mysql> explain select max(DepDelayMinutes), Carrier, dayofweek from ontime_2010
where dayofweek =7 group by Carrier, dayofweek\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: ontime_2010
         type: ref
possible_keys: covered
          key: covered
      key_len: 2
          ref: const
         rows: 905138
        Extra: Using where; Using index
1 row in set (0.00 sec)
</pre>
</blockquote>
<p>However, as the dayofweek part has low number of unique values, mysql will have to scan a lots of index entries (estimated rows: 905138).<br />
<span id="more-58"></span></p>
<p>MySQL can use <a href="http://dev.mysql.com/doc/refman/5.1/en/group-by-optimization.html#loose-index-scan" target="_new">loose index scan</a>. Unfortunately, a o lots of limitations apply:</p>
<ul>
<li>The query is over a single table.
<li>The GROUP BY names only columns that form a leftmost prefix of the index and no other columns.
<li>The only aggregate functions used in the select list (if any) are MIN() and MAX(), same column
<li> etc&#8230; (see <a href="http://dev.mysql.com/doc/refman/5.1/en/group-by-optimization.html#loose-index-scan" target="_new">docs</a> for details)
</ul>
<p>As our example query is suitable for loose index scan, we can create another index:</p>
<blockquote>
<pre>
mysql> alter table ontime_2010 add key lis1(Carrier, dayofweek, DepDelayMinutes);

mysql> explain select max(DepDelayMinutes), Carrier, dayofweek from ontime_2010
where dayofweek =7 group by Carrier, dayofweek \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: ontime_2010
         type: range
possible_keys: DayOfWeek,covered
          key: lis1
      key_len: 5
          ref: NULL
         rows: 19
        Extra: Using where; Using index for group-by
1 row in set (0.01 sec)
</pre>
</blockquote>
<p>Here, Using index for group-by, means that MySQL uses loose index scan.<br />
Also, and it is really great, it works with range on dayofweek too:</p>
<blockquote>
<pre>
mysql> explain select max(DepDelayMinutes), Carrier, dayofweek from ontime_2010
where dayofweek > 3 group by Carrier, dayofweek \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: ontime_2010
         type: range
possible_keys: DayOfWeek,covered
          key: lis1
      key_len: 5
          ref: NULL
         rows: 19
        Extra: Using where; Using index for group-by
1 row in set (0.00 sec)
</pre>
</blockquote>
<p>And original covered index does not work with ranges in where clause:</p>
<blockquote>
<pre>
mysql> explain select max(DepDelayMinutes), Carrier, dayofweek from ontime_2010 use index (covered)
where dayofweek > 3 group by Carrier, dayofweek \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: ontime_2010
         type: range
possible_keys: covered
          key: covered
      key_len: 2
          ref: NULL
         rows: 2416543
        Extra: Using where; Using index; Using temporary; Using filesort
</pre>
</blockquote>
<p>In the above example, MySQL uses index but still have to create temporary table and filesort.</p>
<p>Now speed comparison:<br />
I&#8217;m using &#8220;ontime&#8221; flight performance statistics data from <a href="http://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236&#038;DB_Short_Name=On-Time" target="_new">transtats.bts.gov</a><br />
The table only consist of data for 2010.<br />
Table size: 5M rows, ~2G in size. Table structure:</p>
<blockquote><pre>
CREATE TABLE `ontime_2010` (
  `YearD` int(11) DEFAULT NULL,
  `MonthD` tinyint(4) DEFAULT NULL,
  `DayofMonth` tinyint(4) DEFAULT NULL,
  `DayOfWeek` tinyint(4) DEFAULT NULL,
  `Carrier` char(2) DEFAULT NULL,
  `Origin` char(5) DEFAULT NULL,
  `DepDelayMinutes` int(11) DEFAULT NULL,
... more fields here ...
) ENGINE=InnoDB DEFAULT CHARSET=latin1
</pre>
</blockquote>
<p>Results (cached index and data):<br />
&#8220;where dayofweek = 7&#8243; (ref)</p>
<ul>
<li>Loose index scan: 0 sec
<li>Covered index: 0.6 sec
</ul>
<p>&#8220;where dayofweek > 3&#8243; (range)</p>
<ul>
<li>Loose index scan: 0 sec
<li>index (+ filesort): 5.53 sec
</ul>
<p>I will also present the findings from this article (among other things) during the upcoming webinar, <a href="http://mysql.com/news-and-events/web-seminars/display-582.html" target="_new">Getting the Best MySQL Performance in Your Products: Part 3, Query Tuning</a>, which will take place on Tuesday, Nov 23 (webinar is free, <a href="http://mysql.com/news-and-events/web-seminars/display-582.html">webinar registration</a>)</p>
]]></content:encoded>
			<wfw:commentRss>http://www.arubin.org/blog/2010/11/18/loose-index-scan-vs-covered-indexes-in-mysql/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Why mysqldump is converting my tables from InnoDB to MyISAM?</title>
		<link>http://www.arubin.org/blog/2010/11/12/why-mysqldump-is-converting-my-tables-from-innodb-to-myisam/</link>
		<comments>http://www.arubin.org/blog/2010/11/12/why-mysqldump-is-converting-my-tables-from-innodb-to-myisam/#comments</comments>
		<pubDate>Fri, 12 Nov 2010 21:23:31 +0000</pubDate>
		<dc:creator>arubin</dc:creator>
				<category><![CDATA[innodb]]></category>
		<category><![CDATA[migration]]></category>
		<category><![CDATA[myisam]]></category>
		<category><![CDATA[mysqldump]]></category>

		<guid isPermaLink="false">http://www.arubin.org/blog/?p=50</guid>
		<description><![CDATA[<p>First of all: mysqldump is not converting tables. It is something else. Here is the story:</p>
<p>One of my clients had a case when they were migrating to a new mysql server: they used mysqldump to export data from the old server (all InnoDB) and imported it to the new server. When finished, all the tables [...]]]></description>
			<content:encoded><![CDATA[<p>First of all: mysqldump is not converting tables. It is something else. Here is the story:</p>
<p>One of my clients had a case when they were migrating to a new mysql server: they used mysqldump to export data from the old server (all InnoDB) and imported it to the new server. When finished, all the tables became MyISAM on the new server. So they asked me this question:<br />
&#8220;Why mysqldump is converting my tables from InnoDB to MyISAM?&#8221;<br />
<span id="more-50"></span></p>
<p>First of all we made sure that the tables are InnoDB on the old server. It was true.<br />
Second we run &#8220;show engines&#8221; on the new server:<br />
<code></p>
<blockquote><p>
+------------+---------+----------------------------------------------------------------+--------------+------+------------+<br />
| Engine     | Support | Comment                                                        | Transactions | XA   | Savepoints |<br />
+------------+---------+----------------------------------------------------------------+--------------+------+------------+<br />
| MyISAM     | DEFAULT | Default engine as of MySQL 3.23 with great performance         | NO           | NO   | NO         |<br />
| MRG_MYISAM | YES     | Collection of identical MyISAM tables                          | NO           | NO   | NO         |<br />
| BLACKHOLE  | YES     | /dev/null storage engine (anything you write to it disappears) | NO           | NO   | NO         |<br />
| CSV        | YES     | CSV storage engine                                             | NO           | NO   | NO         |<br />
| MEMORY     | YES     | Hash based, stored in memory, useful for temporary tables      | NO           | NO   | NO         |<br />
| FEDERATED  | NO      | Federated MySQL storage engine                                 | NULL         | NULL | NULL       |<br />
| ARCHIVE    | YES     | Archive storage engine                                         | NO           | NO   | NO         |<br />
+------------+---------+----------------------------------------------------------------+--------------+------+------------+</p></blockquote>
<p></code><br />
As we see, there is no InnoDB in the list. So, InnoDB was not started.<br />
Next we look into the error log to find out, why InnoDB was not started. And we saw this:</p>
<blockquote><p><code>InnoDB: Error: log file ./ib_logfile0 is of different size 0 5242880 bytes<br />
InnoDB: than specified in the .cnf file 0 134217728 bytes!<br />
[ERROR] Plugin 'InnoDB' init function returned error.<br />
[ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.<br />
[Note] Event Scheduler: Loaded 0 events<br />
 [Note] /usr/sbin/mysqld: ready for connections.<br />
Version: '5.1.51-log'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MySQL Community Server (GPL)</code></p></blockquote>
<p>So, InnoDB was not stared as the size of log files was changed in the my.cnf and the old log files were not moved. Although, mysql server was started, but without InnoDB. In this case mysql restored the tables, but the storage engine was substituted from InnoDB to MyISAM. For example if we create a table with non-existing storage engine, MySQL will use MyISAM instead:</p>
<blockquote><p><code><br />
mysql> create table aaa(i int) engine=non_existing_engine;<br />
Query OK, 0 rows affected, 2 warnings (0.16 sec)</p>
<p>mysql> show warnings;<br />
+---------+------+---------------------------------------------+<br />
| Level   | Code | Message                                     |<br />
+---------+------+---------------------------------------------+<br />
| Warning | 1286 | Unknown table engine 'non_existing_engine'  |<br />
| Warning | 1266 | Using storage engine MyISAM for table 'aaa' |<br />
+---------+------+---------------------------------------------+<br />
2 rows in set (0.00 sec)</code>
</p></blockquote>
<p>That was happened: mysql used MyISAM instead of InnoDB, produced warnings, but they are usually ignored.</p>
<p>The fix was easy: restart mysql using this instructions (http://dev.mysql.com/doc/refman/5.0/en/adding-and-removing.html) and upload the dump again (or convert myisam to innodb manually).</p>
<p><strong>UPDATE: </strong> To prevent this in the future you can do 2 things (thanks Shantanu for pointing this out):</p>
<ol>
<li> If using innodb plugin and mysql 5.1 add this to my.cnf: innodb=FORCE. In this case MySQL will not start if InnoDB failed to start:<br />
<blockquote><p>I<code>nnoDB: Error: log file ./ib_logfile0 is of different size 0 536870912 bytes<br />
InnoDB: than specified in the .cnf file 0 53477376 bytes!<br />
[ERROR] Plugin 'InnoDB' init function returned error.<br />
[ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.<br />
[ERROR] Failed to initialize plugins.<br />
[ERROR] Aborting</code>
</p></blockquote>
<li> use sql_mode=NO_ENGINE_SUBSTITUTION:
<p><code><br />
<blockquote>mysql> set sql_mode=NO_ENGINE_SUBSTITUTION;<br />
Query OK, 0 rows affected (0.00 sec)</p>
<p>mysql> create table aaa(i int) engine=non_existing_engine;<br />
ERROR 1286 (42000): Unknown table engine 'non_existing_engine'
</p></blockquote>
<p></code>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://www.arubin.org/blog/2010/11/12/why-mysqldump-is-converting-my-tables-from-innodb-to-myisam/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Speaking at MySQL Users Conference and Expo 2010</title>
		<link>http://www.arubin.org/blog/2010/04/10/speaking-at-mysql-users-conference-and-expo-2010/</link>
		<comments>http://www.arubin.org/blog/2010/04/10/speaking-at-mysql-users-conference-and-expo-2010/#comments</comments>
		<pubDate>Sat, 10 Apr 2010 20:58:36 +0000</pubDate>
		<dc:creator>arubin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.arubin.org/blog/?p=46</guid>
		<description><![CDATA[<p>I&#8217;ll be speaking on MySQL Users Conference 2010. Talk: MySQL Architecture Design Patterns for Performance, Scalability, and Availability, 11:55am  Thursday, 04/15/2010.  Details.</p>
]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ll be speaking on MySQL Users Conference 2010. Talk: MySQL Architecture Design Patterns for Performance, Scalability, and Availability, 11:55am  Thursday, 04/15/2010.  <a href=": http://en.oreilly.com/mysql2010/public/schedule/detail/13384">Details.</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.arubin.org/blog/2010/04/10/speaking-at-mysql-users-conference-and-expo-2010/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Speaking at Linux Conference in Wellington, New Zealand</title>
		<link>http://www.arubin.org/blog/2010/01/17/speaking-at-linux-conference-in-wellington-new-zealand/</link>
		<comments>http://www.arubin.org/blog/2010/01/17/speaking-at-linux-conference-in-wellington-new-zealand/#comments</comments>
		<pubDate>Sun, 17 Jan 2010 21:36:07 +0000</pubDate>
		<dc:creator>arubin</dc:creator>
				<category><![CDATA[mysql]]></category>
		<category><![CDATA[sphinxsearch]]></category>

		<guid isPermaLink="false">http://www.arubin.org/blog/?p=41</guid>
		<description><![CDATA[<p>I&#8217;ll be speaking at the data retrieval miniconf at Linux Conference in Wellington, New Zealand (Full Text Search with MySQL, Program)
I&#8217;ll cover some new sphinx search features (online updates)</p>
]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ll be speaking at the data retrieval miniconf at Linux Conference in Wellington, New Zealand (Full Text Search with MySQL, <a href="http://miniconf.osda.asn.au/program">Program</a>)<br />
I&#8217;ll cover some new sphinx search features (online updates)</p>
]]></content:encoded>
			<wfw:commentRss>http://www.arubin.org/blog/2010/01/17/speaking-at-linux-conference-in-wellington-new-zealand/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Using Dtrace to find queries creating disk temporary tables</title>
		<link>http://www.arubin.org/blog/2009/10/02/using-dtrace-to-find-queries-creating-disk-temporary-tables/</link>
		<comments>http://www.arubin.org/blog/2009/10/02/using-dtrace-to-find-queries-creating-disk-temporary-tables/#comments</comments>
		<pubDate>Fri, 02 Oct 2009 18:44:52 +0000</pubDate>
		<dc:creator>arubin</dc:creator>
				<category><![CDATA[dtrace]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[performance tuning]]></category>
		<category><![CDATA[temporary tables]]></category>

		<guid isPermaLink="false">http://www.arubin.org/blog/?p=36</guid>
		<description><![CDATA[Showed script with Dtrace to find queries creating disk temporary tables [...]]]></description>
			<content:encoded><![CDATA[<p>Sometimes we have a lots of small and rather fast queries which use group by/order by, thus creating temporary tables. Some of those queries are retrieving text fields and mysql have to use disk (myisam) temporary tables. Those queries usually run for less than 1-2 seconds, so they did not get into slow query log, however, they sometimes add serious load on the system.</p>
<p>Here is the stat example:</p>
<pre>
bash-3.00$  /usr/local/mysql/bin/mysqladmin -uroot -p -i 2 -r extended-status|grep tmp_disk
...
| Created_tmp_disk_tables           | 109           |
| Created_tmp_disk_tables           | 101           |
| Created_tmp_disk_tables           | 122           |
...
</pre>
<p>40-50 tmp_disk_tables created per second</p>
<p>So, how can we grab those queries? Usually we have to temporary enable general log, filter out queries with &#8220;group by/order by&#8221; and profile them all. On solaris/mac we can use dtrace instead.</p>
<p>Here is the simple script, which will find the list of queries creating tmp_disk_tables:</p>
<pre>
#pragma D option quiet
dtrace:::BEGIN
{
printf("Tracing... Hit Ctrl-C to end.\n");
}

pid$target::*mysql_parse*:entry
{
self->query = copyinstr(arg1);
}

pid$target::*create_myisam_tmp_table*:return
{
@query[self->query] = count();
}
</pre>
<p>put it into tmpdisktable.d, chmod +x tmpdisktable.d and run it with<br />
./tmpdisktable.d -p `pgrep -x mysqld`</p>
<p>Ctrl+C after 5 seconds whatever and you will see the queries:</p>
<pre>
# ./tmpdisktable.d -p `pgrep -x mysqld`
Tracing... Hit Ctrl-C to end.
^C
</pre>
<p>Queries are stripped by the &#8220;strsize&#8221;, which is can be tweaked:</p>
<pre>#pragma D option strsize=N</pre>
<p>We can increase the &#8220;strsize&#8221; length now and run the script again to get the real queries examples.</p>
<p>Please note: running dtrace for a while can decrease performance, so do not run it for more than couple minutes on production systems. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.arubin.org/blog/2009/10/02/using-dtrace-to-find-queries-creating-disk-temporary-tables/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Reporting Queries with Sphinx</title>
		<link>http://www.arubin.org/blog/2009/10/01/reporting-queries-with-sphinx/</link>
		<comments>http://www.arubin.org/blog/2009/10/01/reporting-queries-with-sphinx/#comments</comments>
		<pubDate>Thu, 01 Oct 2009 11:47:51 +0000</pubDate>
		<dc:creator>arubin</dc:creator>
				<category><![CDATA[mysql]]></category>
		<category><![CDATA[performance tuning]]></category>
		<category><![CDATA[reporting]]></category>
		<category><![CDATA[sphinxsearch]]></category>

		<guid isPermaLink="false">http://www.arubin.org/blog/?p=16</guid>
		<description><![CDATA[<p>Reporting queries (I will use this term here) are the queries which summaries and groups data over the certain period of time. For example, in Social Network site we want to know how many messages have been sent for the given period of time, group by region and status (sent, received, etc), order by number [...]]]></description>
			<content:encoded><![CDATA[<p>Reporting queries (I will use this term here) are the queries which summaries and groups data over the certain period of time. For example, in Social Network site we want to know how many messages have been sent for the given period of time, group by region and status (sent, received, etc), order by number of messages sent.</p>
<p>As an example I will take a table which is used to send SMS (text messages).</p>
<blockquote><p><code><strong>SQL: select concat('+', substring(region_code,1 ,2), 'xxx') as reg, status, count(*) as cnt<br />
from messages<br />
where submition_date between '2009-01-01' and '2009-04-01' group by reg, status<br />
having cnt&gt;100 order by cnt desc, status limit 100;</strong></code>
</p></blockquote>
<p>This query will do a range scan over the submition_date and perform a filesort. There are common well known approaches which can be used to optimize table (“covered index”, “summary tables”, using external data warehouse, etc). Sometimes those approaches do not work or too complex.</p>
<p>Yet another approach is to use external search/index solution, for example Sphinx Search (<a href="http://www.sphinxsearch.com/">http://www.sphinxsearch.com</a>). In this case, data will be stored in MySQL and sphinx will be used as an external indexer/searcher, with SQL protocol support.</p>
<h3>Using Sphinx</h3>
<p>Starting with version 0.9.9-rc2, Sphinx searchd daemon supports MySQL binary network protocol and can be accessed with regular MySQL API. For instance, &#8216;mysql&#8217; CLI client program works well. Here&#8217;s an example of querying Sphinx using MySQL client:</p>
<blockquote><p>
<strong><code>$ mysql -P 3307<br />
Welcome to the MySQL monitor.  Commands end with ; or \g.<br />
Your MySQL connection id is 1<br />
Server version: 0.9.9-dev (r1734)</code><br />
</strong></p></blockquote>
<p>As Sphinx can use attributes (“fields”) and group/sort then, it can be used for our report. Also, an application can simply connect to Sphinx server with MySQL protocol: an application will think it will work with MySQL (there are minor differences in Sphinx SQL, like “@count” and support of timestamps only instead of datetime)</p>
<p>Here is the example of the above query in Sphinx:<br />
<strong><br />
<blockquote><code>mysql&gt; select *<br />
from messages_dw<br />
where<br />
submition_date &gt; 1230793200<br />
and submition_date &lt; 1238569200<br />
group by region_code<br />
order by @count desc<br />
limit 0,10;</p>
<p>10 rows in set (0.19 sec)</code></p></blockquote>
<p></strong><br />
Same query in MySQL 5.1 runs much slower:<br />
<strong><br />
<blockquote><code>select region_code, count(*) as cnt<br />
from messages_dw<br />
where<br />
submition_date &gt; '2009-01-01'<br />
and submition_date &lt; '2009-04-01'<br />
group by region_code<br />
order by cnt desc<br />
limit 0,10;<br />
10 rows in set (14.47 sec)</code></p></blockquote>
<p></strong><br />
2 import notes:</p>
<ol>
<li>For      now, Sphinx can’t group by more than one field. However, we can combine 2      fields in 1 and then group by this new field. Here the example of how we      can do it:</li>
<li>In      the configuration file (in searchd section) we need to set max_matches to      very large number (max_matches = 10000000 for example). By default, Sphinx      will not generate exact counts (and all other average functions); this was      done for the purpose of speed. However, setting max_matches to large      number fixes this issue.</li>
</ol>
<p><strong><br />
<blockquote><code>mysql&gt; select BIGINT(region_code)*4*1024*1024*1024+status_code<br />
as reg_status, *<br />
from messages_dw<br />
where date_added &gt; 1230793200<br />
and date_added &lt; 1238569200<br />
group by reg_status<br />
order by @count desc, region_code<br />
limit 0,10;</code></p></blockquote>
<p></strong>More speed comparison, group by 2 fields:</p>
<p>Sphinx:<br />
<strong><br />
<blockquote><code>mysql&gt; select BIGINT(region_code)*4*1024*1024*1024+status_code as reg_status, *  from messages_dw where date_added &gt; 1230793200 and date_added &lt; 1238569200  group by reg_status order by @count desc, region_code limit 0,10;</p>
<p>10 rows in set (0.98 sec)</code></p></blockquote>
<p></strong><br />
MySQL:<br />
<strong><br />
<blockquote><code>mysql&gt; select region_code, status+0, count(*) as cnt from messages_dw where  submition_date between '2009-01-01' and '2009-04-01'  group by region_code, status order by cnt desc, region_code limit 0,10;</p>
<p>10 rows in set (14.47 sec)</code></p></blockquote>
<p></strong></p>
<h3>Conclusion</h3>
<p>If you need fast ad-hock reporting queries, SphinxSearch can be a good option.<br />
Advantages:</p>
<ul>
<li>Faster sorting and grouping (which is very important for reporting queries)</li>
<li>No need to use external API for queries, Sphinx now supports mysql protocol</li>
</ul>
<p>Disadvantages:</p>
<ul>
<li>Need to run additional Sphinx daemon</li>
<li>Need to re-index data when it is changing</li>
</ul>
<h3>Sphinx config file</h3>
<blockquote><p><code>source src1<br />
{<br />
type                                    = mysql<br />
sql_host                                = 127.0.0.1<br />
sql_user                                = root<br />
sql_pass                                =<br />
sql_db                                  = dw<br />
sql_port                                = 3309  # optional, default is 3306<br />
sql_query                               = \<br />
SELECT msg_id, region_code, status+0 as status_code,  UNIX_TIMESTAMP(submition_date) AS date_added, 't' as  content \<br />
FROM messages_dw<br />
sql_attr_uint                   = region_code<br />
sql_attr_uint                   = status_code<br />
sql_attr_timestamp              = date_added<br />
sql_query_info                  = SELECT * FROM messages_dw WHERE msg_id=$id<br />
}<br />
index messages_dw<br />
{<br />
source                                  = src1<br />
path                                    = /data1/arubin/sphinx_new//var/data/test1<br />
docinfo                                 = extern<br />
charset_type                    = sbcs<br />
}<br />
indexer<br />
{<br />
mem_limit                               = 32M<br />
}<br />
searchd<br />
{<br />
listen = localhost:3312:mysql41<br />
log                                             = /data1/arubin/sphinx_new//var/log/searchd.log<br />
query_log                               = /data1/arubin/sphinx_new//var/log/query.log<br />
read_timeout                    = 30<br />
max_children                    = 30<br />
pid_file                                = /data1/arubin/sphinx_new//var/log/searchd.pid<br />
max_matches                             = 10000000<br />
seamless_rotate                 = 1<br />
preopen_indexes                 = 0<br />
unlink_old                              = 1<br />
}</code></p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.arubin.org/blog/2009/10/01/reporting-queries-with-sphinx/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>New blog on MySQL</title>
		<link>http://www.arubin.org/blog/2009/09/23/new-blog-on-mysql/</link>
		<comments>http://www.arubin.org/blog/2009/09/23/new-blog-on-mysql/#comments</comments>
		<pubDate>Wed, 23 Sep 2009 19:22:42 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.arubin.org/blog/?p=14</guid>
		<description><![CDATA[<p>I&#8217;ve started my new blog on MySQL. I&#8217;ll focus on MySQL full text search, performance tuning and High Availability (HA)</p>
]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve started my new blog on MySQL. I&#8217;ll focus on MySQL full text search, performance tuning and High Availability (HA)</p>
]]></content:encoded>
			<wfw:commentRss>http://www.arubin.org/blog/2009/09/23/new-blog-on-mysql/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

