Over the last few weeks I’ve been looking at several customers’ slow query logs, and I found in many of them an odd type of query. These are
SELECT statements that contain an
IN clause that includes dozens, sometimes hundreds of values. These statements often end in the slow query log. I’m not sure if these queries are this way by design or if they are generated by a specific database development tool.
I did some tests in one of my own databases, one with only around 10K rows in its largest table. The database corresponds to the Amarok media player. For example, I queried for songs by B. B. King (spelled “BB King”, “B.B. King”, etc. or with other artists: “B. B. King & Eric Clapton”).
The first query used a
JOIN and an
IN clause with all the spellings in my db; the second used the same
WHERE ... name LIKE "BB%" OR name LIKE "B.%". Both had the same execution plan, and both retrieved the same number of results. In MySQL version 4.1 there were some enhancements to the optimizer for treating these massive
IN clauses, which means that for smaller databases, this is expected.
With bigger databases and more complex queries, things are different. For example, in a customer server one of these
IN clauses holds more than 100 values. In this case MySQL creates a temporary table and uses filesort. It should be easy to transform the
IN in an
INSERT into a
TEMPORARY MEMORY table, and use a
JOIN in place of the
IN clause in the original query. This way, we are avoiding the filesort altogether, which should result in a better query execution time. Unfortunately it’s not practical to post a query that would contain so many values specified in it.
What’s your experience with these kind of expressions? I’d love to learn where do these gigantic
IN clauses come from and hear some use-cases.