Uncategorized – Alexander Rubin's Blog on MySQL

MySQL Visual Explain

arubin — Wed, 26 Sep 2012 17:33:03 +0000

If you are tied of reading the old “text-only” output of MySQL Explain, then you will enjoy the new MySQL Visual Explain feature of MySQL Workbench (works with MySQL 5.6+).

Before:

mysql> explain select max(DepDelayMinutes), carrier, dayofweek from ontime.ontime_2010 where dayofweek = 7 group by Carrier, dayofweek\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: ontime_2010 type: ref possible_keys: DayOfWeek,dw_carr,covered key: covered key_len: 2 ref: const rows: 1337314 Extra: Using where; Using index 1 row in set (0.00 sec)

After:

How to test:

Fixing data warehousing queries with group-by

arubin — Mon, 29 Nov 2010 20:50:59 +0000

With the standard data warehousing queries we have a fact table and dimension tables and we join them.
For example, the fact table (Table size: 5M rows, ~2G in size) from my previous Loose index scan vs. covered indexes in MySQL post:

    CREATE TABLE `ontime_2010` (
      `YearD` int(11) DEFAULT NULL,
      `MonthD` tinyint(4) DEFAULT NULL,
      `DayofMonth` tinyint(4) DEFAULT NULL,
      `DayOfWeek` tinyint(4) DEFAULT NULL,
      `Carrier` char(2) DEFAULT NULL,
      `Origin` char(5) DEFAULT NULL,
      `DepDelayMinutes` int(11) DEFAULT NULL,
      `AirlineID` int(11) DEFAULT NULL,
      `Cancelled` tinyint(4) DEFAULT NULL,
    ... more fields here ...
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1

(this is not the best possible fact table as the data is not aggregated by I’ll use it for now).

And we have those dimensions tables:

 CREATE TABLE `airlines` (
  `AirlineID` int(11) NOT NULL DEFAULT '0',
  `AirlineName` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`AirlineID`),
  KEY `AirlineName` (`AirlineName`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

CREATE TABLE `date_dayofweek` (
  `code` int(11) NOT NULL DEFAULT '0',
  `description` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`code`),
  KEY `description` (`description`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

mysql> select * from date_dayofweek order by code;
+------+-------------+
| code | description |
+------+-------------+
|    1 | Monday      |
|    2 | Tuesday     |
|    3 | Wednesday   |
|    4 | Thursday    |
|    5 | Friday      |
|    6 | Saturday    |
|    7 | Sunday      |
|    9 | Unknown     |
+------+-------------+
8 rows in set (0.00 sec)

So here is the example query (find sum of cancelled flights on Sundays for the given airline group by day):

select sum(Cancelled), FlightDate, AirlineName  
from ontime_2010 o, date_dayofweek dow, airlines a 
where o.dayofweek=dow.code and dow.description = 'Sunday' 
and a.AirlineID = o.AirlineID and a.AirlineName = 'Delta Air Lines Inc.: DL' 
group by FlightDate order by FlightDate desc limit 10\G

To fix the query we can add a covered index for ontime_2010, so that all fields for ontime_2010 table will be covered:

alter table ontime_2010 add key cov2(AirlineID, dayofweek, FlightDate, Cancelled);

However we will still have “temporary table and filesort”:


mysql> explain select sum(Cancelled), FlightDate  
from ontime_2010 o, date_dayofweek dow, airlines a 
where o.dayofweek=dow.code and dow.description = 'Sunday' 
and a.AirlineID = o.AirlineID and a.AirlineName = 'Delta Air Lines Inc.: DL' 
group by FlightDate order by FlightDate desc limit 10\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: dow
         type: ref
possible_keys: PRIMARY,description
          key: description
      key_len: 258
          ref: const
         rows: 1
        Extra: Using where; Using index; Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: a
         type: ref
possible_keys: PRIMARY,AirlineName
          key: AirlineName
      key_len: 258
          ref: const
         rows: 1
        Extra: Using where; Using index
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: o
         type: ref
possible_keys: DayOfWeek,covered,AirlineID,cov2
          key: cov2
      key_len: 7
          ref: ontime.a.AirlineID,ontime.dow.code
         rows: 24417
        Extra: Using where; Using index
3 rows in set (0.00 sec)

To avoid filesort we can re-write this query with "subqueries":

mysql> explain select sum(Cancelled), FlightDate  from ontime_2010 o 
where o.dayofweek= (select code from date_dayofweek where description = 'Sunday') 
and AirlineID = (select AirlineID from airlines where AirlineName = 'Delta Air Lines Inc.: DL') 
group by FlightDate limit 10\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: o
         type: ref
possible_keys: DayOfWeek,covered,AirlineID,cov2
          key: cov2
      key_len: 7
          ref: const,const
         rows: 152510
        Extra: Using where; Using index
*************************** 2. row ***************************
           id: 3
  select_type: SUBQUERY
        table: airlines
         type: ref
possible_keys: AirlineName
          key: AirlineName
      key_len: 258
          ref:
         rows: 1
        Extra: Using where; Using index
*************************** 3. row ***************************
           id: 2
  select_type: SUBQUERY
        table: date_dayofweek
         type: ref
possible_keys: description
          key: description
      key_len: 258
          ref:
         rows: 1
        Extra: Using where; Using index
3 rows in set (0.00 sec)

As MySQL will use indexes when we have "field = (select .. )" and now all fields in the index belong to the single table, MySQL will use index and avoid filesort. Please note: this will not work with "field in (select ...)" and also make sure that the subselect part will return only 1 row.

Converting queries with OR to Union to ulitize indexes

arubin — Mon, 22 Nov 2010 21:25:04 +0000

Lets say we have a table storing mail messages and we need to show user’s mailbox: messages sent “from” and “to” the specified user.

Here is our table:

CREATE TABLE `internalmail` (
  `mail_id` int(10) NOT NULL AUTO_INCREMENT,
  `senderaddress_id` int(10) NOT NULL,
  `recipientaddress_id` int(10) NOT NULL,
  `mail_timestamp` timestamp NULL DEFAULT NULL,
... message body, etc ...
  PRIMARY KEY (`mail_id`),
) ENGINE=InnoDB DEFAULT CHARSET=latin1

And our query:

select * from internalmail
 where (senderaddress_id = 247 or recipientaddress_id = 247 or recipientaddress_id = 0) 
and mail_timestamp > '2010-08-01 12:30:47' 
order by mail_timestamp desc

In this query we show all messages from and to user_id = 247 plus all messages to system user (user_id=0). We need to show only messages for the last 3 months and show the most recent messages first.

To speed up the query we can try creating indexes:

KEY `recipientaddress_id` (`recipientaddress_id`),
KEY `senderaddress_id` (`senderaddress_id`),
KEY `mail_timestamp` (`mail_timestamp`),

However, as the query uses “OR”, MySQL will use a filesort.

mysql> explain select * from internalmail 
where (senderaddress_id = 247 or recipientaddress_id = 247 or recipientaddress_id = 0) 
and mail_timestamp > '2010-08-01 12:30:47' 
order by mail_timestamp desc\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: internalmail
         type: ALL
possible_keys: recipientaddress_id,senderaddress_id,mail_timestamp
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 4843257
        Extra: Using where; Using filesort
1 row in set (0.00 sec)

UPDATE: even if we will create combined indexes on (recipientaddress_id,mail_timestamp) and/or (senderaddress_id,mail_timestamp) those indexes will not be used, as the query contains “OR” in the where clause.

And original query runs for 3 seconds. To fix this query we can do 2 things:

Rewrite query with UNION instead of OR
Create combined indexes

First, we rewrite query with UNION:

(select * from internalmail where senderaddress_id = 247 and mail_timestamp > ‘2010-08-19 12:30:47’)
union
(select * from internalmail where recipientaddress_id = 247 and mail_timestamp > ‘2010-08-19 12:30:47’)
union
(select * from internalmail where recipientaddress_id = 0 and mail_timestamp > ‘2010-08-19 12:30:47’)
order by mail_timestamp desc;

Second, we create 2 indexes:

mysql> alter table internalmail add key send_dt(senderaddress_id, mail_timestamp);
mysql> alter table internalmail add key recieve_dt(recipientaddress_id, mail_timestamp);

After that, MySQL will be able to fully utilize index for each of the 3 queries in union:

mysql> explain 
(select * from internalmail where senderaddress_id = 247  and mail_timestamp > '2010-08-19 12:30:47')  
union 
(select * from internalmail where  recipientaddress_id = 247  and mail_timestamp > '2010-08-19 12:30:47')  
union 
(select * from internalmail where  recipientaddress_id = 0  and mail_timestamp > '2010-08-19 12:30:47')  
order by mail_timestamp desc\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: internalmail
         type: range
possible_keys: senderaddress_id,mail_timestamp,send_dt
          key: send_dt
      key_len: 9
          ref: NULL
         rows: 5
        Extra: Using where
*************************** 2. row ***************************
           id: 2
  select_type: UNION
        table: internalmail
         type: range
possible_keys: recipientaddress_id,mail_timestamp,recieve_dt
          key: recieve_dt
      key_len: 9
          ref: NULL
         rows: 11
        Extra: Using where
*************************** 3. row ***************************
           id: 3
  select_type: UNION
        table: internalmail
         type: range
possible_keys: recipientaddress_id,mail_timestamp,recieve_dt
          key: recieve_dt
      key_len: 9
          ref: NULL
         rows: 1
        Extra: Using where
*************************** 4. row ***************************
           id: NULL
  select_type: UNION RESULT
        table: 
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
        Extra: Using filesort
4 rows in set (0.00 sec)

Although this query has to perform a final filesort it is much faster: now it runs in 0 sec compared to 3 seconds originally.

Speaking at MySQL Users Conference and Expo 2010

arubin — Sat, 10 Apr 2010 20:58:36 +0000

I’ll be speaking on MySQL Users Conference 2010. Talk: MySQL Architecture Design Patterns for Performance, Scalability, and Availability, 11:55am Thursday, 04/15/2010. Details.

New blog on MySQL

admin — Wed, 23 Sep 2009 19:22:42 +0000

I’ve started my new blog on MySQL. I’ll focus on MySQL full text search, performance tuning and High Availability (HA)