<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Alexander Rubin&#039;s Blog on MySQL</title>
	<atom:link href="http://www.arubin.org/blog/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.arubin.org/blog</link>
	<description>MySQL, FullText Search, Performance, High Availability</description>
	<lastBuildDate>Sun, 17 Jan 2010 21:36:07 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Speaking at Linux Conference in Wellington, New Zealand</title>
		<link>http://www.arubin.org/blog/2010/01/17/speaking-at-linux-conference-in-wellington-new-zealand/</link>
		<comments>http://www.arubin.org/blog/2010/01/17/speaking-at-linux-conference-in-wellington-new-zealand/#comments</comments>
		<pubDate>Sun, 17 Jan 2010 21:36:07 +0000</pubDate>
		<dc:creator>arubin</dc:creator>
				<category><![CDATA[mysql]]></category>
		<category><![CDATA[sphinxsearch]]></category>

		<guid isPermaLink="false">http://www.arubin.org/blog/?p=41</guid>
		<description><![CDATA[<p>I&#8217;ll be speaking at the data retrieval miniconf at Linux Conference in Wellington, New Zealand (Full Text Search with MySQL, Program)
I&#8217;ll cover some new sphinx search features (online updates)</p>
]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ll be speaking at the data retrieval miniconf at Linux Conference in Wellington, New Zealand (Full Text Search with MySQL, <a href="http://miniconf.osda.asn.au/program">Program</a>)<br />
I&#8217;ll cover some new sphinx search features (online updates)</p>
]]></content:encoded>
			<wfw:commentRss>http://www.arubin.org/blog/2010/01/17/speaking-at-linux-conference-in-wellington-new-zealand/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Using Dtrace to find queries creating disk temporary tables</title>
		<link>http://www.arubin.org/blog/2009/10/02/using-dtrace-to-find-queries-creating-disk-temporary-tables/</link>
		<comments>http://www.arubin.org/blog/2009/10/02/using-dtrace-to-find-queries-creating-disk-temporary-tables/#comments</comments>
		<pubDate>Fri, 02 Oct 2009 18:44:52 +0000</pubDate>
		<dc:creator>arubin</dc:creator>
				<category><![CDATA[dtrace]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[performance tuning]]></category>
		<category><![CDATA[temporary tables]]></category>

		<guid isPermaLink="false">http://www.arubin.org/blog/?p=36</guid>
		<description><![CDATA[Showed script with Dtrace to find queries creating disk temporary tables [...]]]></description>
			<content:encoded><![CDATA[<p>Sometimes we have a lots of small and rather fast queries which use group by/order by, thus creating temporary tables. Some of those queries are retrieving text fields and mysql have to use disk (myisam) temporary tables. Those queries usually run for less than 1-2 seconds, so they did not get into slow query log, however, they sometimes add serious load on the system.</p>
<p>Here is the stat example:</p>
<pre>
bash-3.00$  /usr/local/mysql/bin/mysqladmin -uroot -p -i 2 -r extended-status|grep tmp_disk
...
| Created_tmp_disk_tables           | 109           |
| Created_tmp_disk_tables           | 101           |
| Created_tmp_disk_tables           | 122           |
...
</pre>
<p>40-50 tmp_disk_tables created per second</p>
<p>So, how can we grab those queries? Usually we have to temporary enable general log, filter out queries with &#8220;group by/order by&#8221; and profile them all. On solaris/mac we can use dtrace instead.</p>
<p>Here is the simple script, which will find the list of queries creating tmp_disk_tables:</p>
<pre>
#pragma D option quiet
dtrace:::BEGIN
{
printf("Tracing... Hit Ctrl-C to end.\n");
}

pid$target::*mysql_parse*:entry
{
self->query = copyinstr(arg1);
}

pid$target::*create_myisam_tmp_table*:return
{
@query[self->query] = count();
}
</pre>
<p>put it into tmpdisktable.d, chmod +x tmpdisktable.d and run it with<br />
./tmpdisktable.d -p `pgrep -x mysqld`</p>
<p>Ctrl+C after 5 seconds whatever and you will see the queries:</p>
<pre>
# ./tmpdisktable.d -p `pgrep -x mysqld`
Tracing... Hit Ctrl-C to end.
^C
</pre>
<p>Queries are stripped by the &#8220;strsize&#8221;, which is can be tweaked:</p>
<pre>#pragma D option strsize=N</pre>
<p>We can increase the &#8220;strsize&#8221; length now and run the script again to get the real queries examples.</p>
<p>Please note: running dtrace for a while can decrease performance, so do not run it for more than couple minutes on production systems. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.arubin.org/blog/2009/10/02/using-dtrace-to-find-queries-creating-disk-temporary-tables/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Reporting Queries with Sphinx</title>
		<link>http://www.arubin.org/blog/2009/10/01/reporting-queries-with-sphinx/</link>
		<comments>http://www.arubin.org/blog/2009/10/01/reporting-queries-with-sphinx/#comments</comments>
		<pubDate>Thu, 01 Oct 2009 11:47:51 +0000</pubDate>
		<dc:creator>arubin</dc:creator>
				<category><![CDATA[mysql]]></category>
		<category><![CDATA[performance tuning]]></category>
		<category><![CDATA[reporting]]></category>
		<category><![CDATA[sphinxsearch]]></category>

		<guid isPermaLink="false">http://www.arubin.org/blog/?p=16</guid>
		<description><![CDATA[<p>Reporting queries (I will use this term here) are the queries which summaries and groups data over the certain period of time. For example, in Social Network site we want to know how many messages have been sent for the given period of time, group by region and status (sent, received, etc), order by number [...]]]></description>
			<content:encoded><![CDATA[<p>Reporting queries (I will use this term here) are the queries which summaries and groups data over the certain period of time. For example, in Social Network site we want to know how many messages have been sent for the given period of time, group by region and status (sent, received, etc), order by number of messages sent.</p>
<p>As an example I will take a table which is used to send SMS (text messages).</p>
<blockquote><p><code><strong>SQL: select concat('+', substring(region_code,1 ,2), 'xxx') as reg, status, count(*) as cnt<br />
from messages<br />
where submition_date between '2009-01-01' and '2009-04-01' group by reg, status<br />
having cnt&gt;100 order by cnt desc, status limit 100;</strong></code>
</p></blockquote>
<p>This query will do a range scan over the submition_date and perform a filesort. There are common well known approaches which can be used to optimize table (“covered index”, “summary tables”, using external data warehouse, etc). Sometimes those approaches do not work or too complex.</p>
<p>Yet another approach is to use external search/index solution, for example Sphinx Search (<a href="http://www.sphinxsearch.com/">http://www.sphinxsearch.com</a>). In this case, data will be stored in MySQL and sphinx will be used as an external indexer/searcher, with SQL protocol support.</p>
<h3>Using Sphinx</h3>
<p>Starting with version 0.9.9-rc2, Sphinx searchd daemon supports MySQL binary network protocol and can be accessed with regular MySQL API. For instance, &#8216;mysql&#8217; CLI client program works well. Here&#8217;s an example of querying Sphinx using MySQL client:</p>
<blockquote><p>
<strong><code>$ mysql -P 3307<br />
Welcome to the MySQL monitor.  Commands end with ; or \g.<br />
Your MySQL connection id is 1<br />
Server version: 0.9.9-dev (r1734)</code><br />
</strong></p></blockquote>
<p>As Sphinx can use attributes (“fields”) and group/sort then, it can be used for our report. Also, an application can simply connect to Sphinx server with MySQL protocol: an application will think it will work with MySQL (there are minor differences in Sphinx SQL, like “@count” and support of timestamps only instead of datetime)</p>
<p>Here is the example of the above query in Sphinx:<br />
<strong><br />
<blockquote><code>mysql&gt; select *<br />
from messages_dw<br />
where<br />
submition_date &gt; 1230793200<br />
and submition_date &lt; 1238569200<br />
group by region_code<br />
order by @count desc<br />
limit 0,10;</p>
<p>10 rows in set (0.19 sec)</code></p></blockquote>
<p></strong><br />
Same query in MySQL 5.1 runs much slower:<br />
<strong><br />
<blockquote><code>select region_code, count(*) as cnt<br />
from messages_dw<br />
where<br />
submition_date &gt; '2009-01-01'<br />
and submition_date &lt; '2009-04-01'<br />
group by region_code<br />
order by cnt desc<br />
limit 0,10;<br />
10 rows in set (14.47 sec)</code></p></blockquote>
<p></strong><br />
2 import notes:</p>
<ol>
<li>For      now, Sphinx can’t group by more than one field. However, we can combine 2      fields in 1 and then group by this new field. Here the example of how we      can do it:</li>
<li>In      the configuration file (in searchd section) we need to set max_matches to      very large number (max_matches = 10000000 for example). By default, Sphinx      will not generate exact counts (and all other average functions); this was      done for the purpose of speed. However, setting max_matches to large      number fixes this issue.</li>
</ol>
<p><strong><br />
<blockquote><code>mysql&gt; select BIGINT(region_code)*4*1024*1024*1024+status_code<br />
as reg_status, *<br />
from messages_dw<br />
where date_added &gt; 1230793200<br />
and date_added &lt; 1238569200<br />
group by reg_status<br />
order by @count desc, region_code<br />
limit 0,10;</code></p></blockquote>
<p></strong>More speed comparison, group by 2 fields:</p>
<p>Sphinx:<br />
<strong><br />
<blockquote><code>mysql&gt; select BIGINT(region_code)*4*1024*1024*1024+status_code as reg_status, *  from messages_dw where date_added &gt; 1230793200 and date_added &lt; 1238569200  group by reg_status order by @count desc, region_code limit 0,10;</p>
<p>10 rows in set (0.98 sec)</code></p></blockquote>
<p></strong><br />
MySQL:<br />
<strong><br />
<blockquote><code>mysql&gt; select region_code, status+0, count(*) as cnt from messages_dw where  submition_date between '2009-01-01' and '2009-04-01'  group by region_code, status order by cnt desc, region_code limit 0,10;</p>
<p>10 rows in set (14.47 sec)</code></p></blockquote>
<p></strong></p>
<h3>Conclusion</h3>
<p>If you need fast ad-hock reporting queries, SphinxSearch can be a good option.<br />
Advantages:</p>
<ul>
<li>Faster sorting and grouping (which is very important for reporting queries)</li>
<li>No need to use external API for queries, Sphinx now supports mysql protocol</li>
</ul>
<p>Disadvantages:</p>
<ul>
<li>Need to run additional Sphinx daemon</li>
<li>Need to re-index data when it is changing</li>
</ul>
<h3>Sphinx config file</h3>
<blockquote><p><code>source src1<br />
{<br />
type                                    = mysql<br />
sql_host                                = 127.0.0.1<br />
sql_user                                = root<br />
sql_pass                                =<br />
sql_db                                  = dw<br />
sql_port                                = 3309  # optional, default is 3306<br />
sql_query                               = \<br />
SELECT msg_id, region_code, status+0 as status_code,  UNIX_TIMESTAMP(submition_date) AS date_added, 't' as  content \<br />
FROM messages_dw<br />
sql_attr_uint                   = region_code<br />
sql_attr_uint                   = status_code<br />
sql_attr_timestamp              = date_added<br />
sql_query_info                  = SELECT * FROM messages_dw WHERE msg_id=$id<br />
}<br />
index messages_dw<br />
{<br />
source                                  = src1<br />
path                                    = /data1/arubin/sphinx_new//var/data/test1<br />
docinfo                                 = extern<br />
charset_type                    = sbcs<br />
}<br />
indexer<br />
{<br />
mem_limit                               = 32M<br />
}<br />
searchd<br />
{<br />
listen = localhost:3312:mysql41<br />
log                                             = /data1/arubin/sphinx_new//var/log/searchd.log<br />
query_log                               = /data1/arubin/sphinx_new//var/log/query.log<br />
read_timeout                    = 30<br />
max_children                    = 30<br />
pid_file                                = /data1/arubin/sphinx_new//var/log/searchd.pid<br />
max_matches                             = 10000000<br />
seamless_rotate                 = 1<br />
preopen_indexes                 = 0<br />
unlink_old                              = 1<br />
}</code></p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.arubin.org/blog/2009/10/01/reporting-queries-with-sphinx/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>New blog on MySQL</title>
		<link>http://www.arubin.org/blog/2009/09/23/new-blog-on-mysql/</link>
		<comments>http://www.arubin.org/blog/2009/09/23/new-blog-on-mysql/#comments</comments>
		<pubDate>Wed, 23 Sep 2009 19:22:42 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.arubin.org/blog/?p=14</guid>
		<description><![CDATA[<p>I&#8217;ve started my new blog on MySQL. I&#8217;ll focus on MySQL full text search, performance tuning and High Availability (HA)</p>
]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve started my new blog on MySQL. I&#8217;ll focus on MySQL full text search, performance tuning and High Availability (HA)</p>
]]></content:encoded>
			<wfw:commentRss>http://www.arubin.org/blog/2009/09/23/new-blog-on-mysql/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
