MySQL Can’t Use Index With Uncorrelated IN Subquery

Posted in: MySQL, Technical Track

Today is the first time I had to look at MySQL performance. Tiny database as web application back-end was having significant performance issues with spikes of CPU workload. After identifying problematic queries, I found a pile of statements using IN subqueries. Typical example is:

SELECT *
   FROM t1
  WHERE c1 IN (SELECT c1
                 FROM t2
                WHERE c2=100);

Checked explain plan – t1 is accessed via full table scan.
Note that table t1 had an index on column c1 and it was selective and statistics collected. Rewrote statement as

SELECT t1.*
   FROM t1, t2
  WHERE t1.c1=t2.c1
    AND t2.c2=100);

Query flies. Checked execution plan – index on t1 column is used.
Quick research located the bug #9021which might get fixed in 5.1.

I asked around and our MySQL experts assured me that this is one of the minor and not so disturbing issues, in fact. I can’t imagine what those disturbing issue are. I guess Oracle XE does have some advantages over MySQL for small installations.

email
Want to talk with an expert? Schedule a call with our team to get the conversation started.

About the Author

What does it take to be chief technology officer at a company of technology experts? Experience. Imagination. Passion. Alex Gorbachev has all three. He’s played a key role in taking the company global, having set up Pythian’s Asia Pacific operations. Today, the CTO office is an incubator of new services and technologies – a mini-startup inside Pythian. Most recently, Alex built a Big Data Engineering services team and established a Data Science practice. Highly sought after for his deep expertise and interest in emerging trends, Alex routinely speaks at industry events as a member of the OakTable.

8 Comments. Leave new

I exchanged an IM conversation with Alex about this and he suggested I post my thoughts on the blog. Here they are lightly edited:

gorbyx: https://www.pythian.com/blogs/355/mysql-cant-use-index-with-uncorrelated-in-subquery

Paul: interesting observation for you
the faster the underlying hardware, the less valuable oracle’s indexing and optimizing innovations (not to mention patents) become
as as hardware is getting faster faster and faster, these issues become minor issues and then non-issues
this causes enormous depreciation of IBM’s and oracle’s technology and patent portfolios
so ultimately, especially following the millsap what’s the business impact tuning method, these issues with databse platforms are meaningless. if it’s not meaningless now, it’ll be meaningless next year, or the year after that anyway.
in the meantime, it’s cheaper to pay you to rewrite the query than it is to license oracle or db2
gorbyx: well, I see the point. I don’t agree so far but what do I see in my life?
put you comment on the blog – it’s interesting

Reply

I assume the mismatched bracket in the new query is just a typo, but can’t help but point out that, if there can be multiple rows in ‘t2’ with a ‘c2′ of 100 and the same ”c1’ value, then the queries are not equivalent.
As for Paul’s comment, the problem is that while CPUs are getting faster, and disk space is getting bigger/cheaper, disk access speed is pretty much static AND data volumes are increasing. As such, indexing and optimization innovations get more important not less. That assumes data volumes keep in-memory databases from the mainstream though.

Reply
Alex Gorbachev
January 9, 2007 5:25 pm

Gary,

Thanks for spotting the typo or rather c&p error. This just a dirty example assuming that t2.c1 is a PK, for example. Obviously, I couldn’t share real customer queries. The right statement should be:

SELECT t1.*
   FROM t1, t2
  WHERE t1.c1=t2.c1
    AND t2.c2=100;

In addition to what you said, our customer expressed an interesting point today – would you prefer letting that mainframe software keep you credit card and account balance or it’s fine to move it to Oracle? Now what about MySQL?

Perhaps, some people would answer yes to the second question now. In some time they might even answer yes to the second question.

However, there are people (in fact, majority) that would answer “Why would I care?”. Those are the people who don’t need to know about mainframes, Oracle and MySQL but do need to know that they are able to take cash from the ATM machine anytime and they don’t need to wait few minutes for that. Of course, they have some other requirements like no one else is able to manipulate their account.

If that can be delivered by MySQL at some point – why not?

Reply

In regard to Paul’s comment: I have yet to see faster CPU hardware make a disk spin faster! For as long as we have to use disks – or any storage technology outside the CPU itself – the performance limitation will be how fast we can get that data back into the main memory for processing by the CPU. And that’s where indexes, optimizers and other such kick in.

Looking at volumetric trends shows us also that while disk performance and speed hasn’t improved as much, disk capacity certainly has. There is as such a current imbalance between how much you stack in a disk and how fast you can do it or get it. And I don’t see any hardware way of addressing that imbalance. Other than solid state memory, which is a bit far away at the moment in terms of usable capacity.

All this to say that indexing and optimisers will be here for a long while yet. In fact, looking at Jim Gray’s “Petabyte” musings, I’d say you’ll see even more interesting stuff happening in the field of large scale data storage cataloguing: someone, somehow, somewhere, has to be able to efficiently get at all those PBs!

Reply

Alex,

I had the exact same problem with a lot of queries like this in oracle. Not much of performance until we rewrote the subqueries as joins. Subqueries seem to be a bit of a problem for the oracle optimizer as well. The queries were of course somewhat more complex which did not make it easier for the optimizer.
Maybe the optimizer for MySQL is not that bad after all? Or they both need some work to be done?

Where these queries generated through Hibernate by any chance? In my case they were and if IIRC you could ‘teach’ Hibernate to generate the joins instead of the subqueries. But I do not remember how that was done. Probably to be found in the Hibernate docs.

Reply
Alex Gorbachev
January 11, 2007 9:09 am

Eric, this was custom written application. What version of Oracle were you on? Your IN subqueries were uncorrelated? I find that correlated subqueries often causes issues even in 10g and, perhaps, more often in 10g with introduction of few smart CBO features that don’t seem to work well all the time.

Reply

9.2.0.6
Most of them where correlated, a few were uncorrelated. In my case both could be ‘solved’ by avoiding the subquery.

Reply
Alex Gorbachev
January 11, 2007 12:39 pm

“In my case both could be ’solved’ by avoiding the subquery.”
Once again a reminder to stick to the old rules.
Thanks for your comment Eric.

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *