Continuing MySQL saga…
This is actually a follow up on my previous post. Developers tried to rewrite all statements and even overdid it. As we say in Russia – “teach fool how to pray and he will break his forehead”. Note, I had to rename columns/tables as I write so sorry for possible typos
Query converted by developers from IN subqueries to joins:
SELECT distinct t.col_1, t.col_2, t.col_3 FROM t LEFT JOIN t1 ON t.col_3 = t1.col_3 LEFT JOIN t2 ON t.col_3 = t2.col_3 WHERE TIMEDIFF(now(),t.col_date) >= '00:05:00' AND t1.col_3 is not null AND t2.col_3 is not null AND t.col_4 = 'string' ORDER BY t.col_5 desc LIMIT 0,6;
My first take was “smart” use of TimeDiff
function in the WHERE
clause. I was slightly surprised by the way it was compared – “>= '00:05:00'
” but more about this later. I checked selectivity of that condition and found that it was very poor – returning large percent of total rows in the table. Column col_4 contained only 2 values either.
Next, I looked at ORDER BY
– created index on col_5 but MySQL didn’t pick it up. Luckily, couple days ago Paul Moen posted about ORDER BY
and LIMIT
optimization so I knew it should work – MySQL is not that bad in the end. ;-) One more careful look and you notice DISTINCT
keyword. Bingo! Obviously, MySQL couldn’t and shouldn’t use index in that case. Interesting that t.col_3 is the primary key so distinct was introduced during conversion from IN subqueries to joins. Well, there is no silver bullet so join is no good here.
In the end query was rewritten as
SELECT 1.col_1, t.col_2, t.col_3 FROM t WHERE TIMEDIFF(now(),t.col_date) >= '00:05:00' AND t.col_4 = 'string' AND EXISTS (select 1 from t1 where t.col_3=t1.col_3) AND EXISTS (select 1 from t2 where t.col_3=t2.col_3) ORDER BY t.col_5 DESC LIMIT 6;
Note that DISTINCT
disappeared and joins were converted to EXISTS
conditions. Very handy! Index created on two columns to cover also t.col_4 = 'string'
:
create index col_4_col_5 on t (col_4,col5);
Result was very positive and response was down to less than 10 ms.
I noticed later that condition with TimeDiff
is very tricky and string comparison in the end is not always appropriate. I would rather use AddTime
and date comparison.
Just now I figured what kept me uneasy about this statement – DISTINCT
causes grouping and ORDER BY
column is not part of the select list! In this particular case there is no problem as primary key is in the select list. However, I can’t figure out how MySQL would group and sort if there are several col_5 values per distinct row. I can’t see the deterministic result. In another example today, I witnessed that I can use a column in ORDER BY
that is not part of GROUP BY
list. Seems like the same issue. If I lost you with this – stay tuned. I must be blogging about it soon.
4 Comments. Leave new
Go to bed!
(Said the man who is still playing around with parallel stuff at 02:26 ;-))
Well, still recovering “lost” standby. By the way, I recorded one soccer game yesterday (Arsenal with …. don’t remember who) so I expect you to come over tomorrow for beer and vobla.
Watch out for an absolutely fantastic piece of clinical finishing in the Arsenal game, even though as a Liverpool fan i’m not talking about Arsenal right now :(.
I actually caught the second half on Saturday :-(
It’s even better than I thought over here – three premiership games on normal TV on a Saturday, including one at 3 o’clock. Not very conducive to working on presentations, though ;-)