With the standard data warehousing queries we have a fact table and dimension tables and we join them.
For example, the fact table (Table size: 5M rows, ~2G in size) from my previous Loose index scan vs. covered indexes in MySQL post:
CREATE TABLE `ontime_2010` (
`YearD` int(11) DEFAULT NULL,
`MonthD` tinyint(4) DEFAULT NULL,
`DayofMonth` tinyint(4) DEFAULT NULL,
`DayOfWeek` tinyint(4) DEFAULT NULL,
`Carrier` char(2) DEFAULT NULL,
`Origin` char(5) DEFAULT NULL,
`DepDelayMinutes` int(11) DEFAULT NULL,
`AirlineID` int(11) DEFAULT NULL,
`Cancelled` tinyint(4) DEFAULT NULL,
... more fields here ...
) ENGINE=InnoDB DEFAULT CHARSET=latin1
(this is not the best possible fact table as the data is not aggregated by I’ll use it for now).
And we have those dimensions tables:
CREATE TABLE `airlines` ( `AirlineID` int(11) NOT NULL DEFAULT '0', `AirlineName` varchar(255) DEFAULT NULL, PRIMARY KEY (`AirlineID`), KEY `AirlineName` (`AirlineName`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 CREATE TABLE `date_dayofweek` ( `code` int(11) NOT NULL DEFAULT '0', `description` varchar(255) DEFAULT NULL, PRIMARY KEY (`code`), KEY `description` (`description`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 1 row in set (0.00 sec) mysql> select * from date_dayofweek order by code; +------+-------------+ | code | description | +------+-------------+ | 1 | Monday | | 2 | Tuesday | | 3 | Wednesday | | 4 | Thursday | | 5 | Friday | | 6 | Saturday | | 7 | Sunday | | 9 | Unknown | +------+-------------+ 8 rows in set (0.00 sec)
So here is the example query (find sum of cancelled flights on Sundays for the given airline group by day):
select sum(Cancelled), FlightDate, AirlineName from ontime_2010 o, date_dayofweek dow, airlines a where o.dayofweek=dow.code and dow.description = 'Sunday' and a.AirlineID = o.AirlineID and a.AirlineName = 'Delta Air Lines Inc.: DL' group by FlightDate order by FlightDate desc limit 10\G
To fix the query we can add a covered index for ontime_2010, so that all fields for ontime_2010 table will be covered:
alter table ontime_2010 add key cov2(AirlineID, dayofweek, FlightDate, Cancelled);
However we will still have “temporary table and filesort”:
mysql> explain select sum(Cancelled), FlightDate
from ontime_2010 o, date_dayofweek dow, airlines a
where o.dayofweek=dow.code and dow.description = 'Sunday'
and a.AirlineID = o.AirlineID and a.AirlineName = 'Delta Air Lines Inc.: DL'
group by FlightDate order by FlightDate desc limit 10\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: dow
type: ref
possible_keys: PRIMARY,description
key: description
key_len: 258
ref: const
rows: 1
Extra: Using where; Using index; Using temporary; Using filesort
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: a
type: ref
possible_keys: PRIMARY,AirlineName
key: AirlineName
key_len: 258
ref: const
rows: 1
Extra: Using where; Using index
*************************** 3. row ***************************
id: 1
select_type: SIMPLE
table: o
type: ref
possible_keys: DayOfWeek,covered,AirlineID,cov2
key: cov2
key_len: 7
ref: ontime.a.AirlineID,ontime.dow.code
rows: 24417
Extra: Using where; Using index
3 rows in set (0.00 sec)
To avoid filesort we can re-write this query with "subqueries":
mysql> explain select sum(Cancelled), FlightDate from ontime_2010 o
where o.dayofweek= (select code from date_dayofweek where description = 'Sunday')
and AirlineID = (select AirlineID from airlines where AirlineName = 'Delta Air Lines Inc.: DL')
group by FlightDate limit 10\G
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: o
type: ref
possible_keys: DayOfWeek,covered,AirlineID,cov2
key: cov2
key_len: 7
ref: const,const
rows: 152510
Extra: Using where; Using index
*************************** 2. row ***************************
id: 3
select_type: SUBQUERY
table: airlines
type: ref
possible_keys: AirlineName
key: AirlineName
key_len: 258
ref:
rows: 1
Extra: Using where; Using index
*************************** 3. row ***************************
id: 2
select_type: SUBQUERY
table: date_dayofweek
type: ref
possible_keys: description
key: description
key_len: 258
ref:
rows: 1
Extra: Using where; Using index
3 rows in set (0.00 sec)
As MySQL will use indexes when we have "field = (select .. )" and now all fields in the index belong to the single table, MySQL will use index and avoid filesort. Please note: this will not work with "field in (select ...)" and also make sure that the subselect part will return only 1 row.
