Refactoring MySQL Applications: Part 1

Posted in: MySQL, Technical Track

First off, let me wish you all a Happy 2009, and my apologies for being a bit slack with blogging in the last two months of 2008. It’s been a bunch of busy weeks, but I’m fully expecting to remedy that in this year.

Second, let me welcome you to this series on refactoring MySQL applications. I will cover best practices and useful tips, as well as show working examples of potential loopholes and solutions.

So, you are a DBA or a developer, and you’re faced with a problem. Your years-old application (perhaps you inherited it from a former co-worker) is now starting to keel over, and your choice of responses is limited.

Let’s start by saying that there are other ways to reach performance, but in some cases refactoring may be the only way that is possible to pursue, for one reason or another. Let’s take the refactoring way, focusing on SQL rather than applications, as there’s a lot that can be done on this side.

But first, why refactoring? Refactoring normally sits between parameter-tuning and throwing more hardware at the problem, and can be a very cost-effective way of solving performance or scalability issues.

So let’s start with a good point about composite indexes that seems to be forgotten sometimes. Given the following data . . .

group, category, name, manufacturer

. . . we want to grab all Electronics from Apple. Given that we have an index on category and manufacturer, the data for category could look like this (visually explained, not following the column specifications): category=electronics, group=mp3, name=whatever name is used, manufacturer=apple/microsoft . . .

...
Electronics|MP3|iPod|Apple
Electronics|MP3|Zune|Microsoft
Electronics|TV|Apple TV|Apple
...

And so on. But, if we want to grab all the Apple electronics, we can quickly see that we’re also fetching other data (Microsoft zune in this case), which means that it’s not really that optimal, given that this is potentially a big table and fetching unneeded data is never good.

What if we use a index on (category,manufacturer) instead?

...
Electronics|Apple|MP3|iPod
Electronics|Apple|TV|Apple TV
Electronics|Microsoft|MP3|Zune
...

As shown here, the even more selective index allows us to group together all apple electronics and pull them out without including the Microsoft ones (if the query is for example SELECT ... FROM t1 WHERE category='Electronics' AND manufacturer='Apple'). However, don’t go around asking for example: SELECT ... FROM t1 WHERE manufacturer='Apple' if you only have this composite index, as it does not form a prefix of our (category,manufacturer) index, and will not be usable.

Using autocommit=1 can also be a major player in performance, so investigate how often you really should commit, and try to optimize that, as committing includes overhead, and will slow down your application. On the other hand, not committing may result in concurrency issues and/or data loss, so you need to find the sweet spot there.

To use views, or not to? Well, in some cases views can increase performance, and in some cases they can reduce it. Let’s take a look at the “employees” example database, trying to count the number of employees in a specific department (whether or not they’re still working there).

Let us also compare a view with another example. So, first let’s create a view:

mysql> CREATE VIEW v_emp_by_dept AS
    -> SELECT dept_no, COUNT(emp_no) Employees
    -> FROM dept_emp
    -> GROUP BY dept_no;
Query OK, 0 rows affected (0.11 sec)

mysql> set query_cache_type=0;
Query OK, 0 rows affected (0.02 sec)

And then let’s get some data out of it:

mysql> SELECT SQL_NO_CACHE dept_no, Employees 
    -> FROM v_emp_by_dept
    -> WHERE 
    ->  dept_no = 'd002';
+---------+-----------+
| dept_no | Employees |
+---------+-----------+
| d002    |     17346 | 
+---------+-----------+
1 row in set (0.18 sec)

All fine and okay. But, 0.18 seconds, even on my (admittedly overloaded) laptop? We gotta be able to decrease that. Let’s try another approach:

mysql> SELECT SQL_NO_CACHE dept_no, Employees
    -> FROM (
    ->  SELECT SQL_NO_CACHE dept_no, COUNT(emp_no) Employees
    ->  FROM
    ->   dept_emp 
    ->  WHERE 
    ->   dept_no='d002' 
    ->  GROUP BY
    ->   dept_no
    -> ) t1;
+---------+-----------+
| dept_no | Employees |
+---------+-----------+
| d002    |     17346 | 
+---------+-----------+
1 row in set (0.01 sec)

0.01 sec, much better (obviously). Basically, the difference here is that I push the WHERE clause up to before the group by, and minimize the number of rows that have to be grouped.

Let’s do another simple example, but this time on unnecessary joins that may be the result of copy/pasted code from some earlier employee. Let’s say you want to grab the employees first name, last name, and department number (perhaps for automatic processing later on). That previous employee apparently grabbed the query from something that also showed the name of the department:

mysql> SELECT
    ->  e.first_name, e.last_name, de.dept_no
    -> FROM
    ->  employees e INNER JOIN
    ->  dept_emp de ON (e.emp_no=de.emp_no)
    ->  LEFT JOIN departments d ON(de.dept_no=d.dept_no)
    -> LIMIT 2;
+------------+-----------+---------+
| first_name | last_name | dept_no |
+------------+-----------+---------+
| Georgi     | Facello   | d005    | 
| Bezalel    | Simmel    | d007    | 
+------------+-----------+---------+
2 rows in set (0.02 sec)

And the explain plan:

+----+-------------+-------+--------+----------------+---------+---------+----------------------+--------+-------------+
| id | select_type | table | type   | possible_keys  | key     | key_len | ref                  | rows   | Extra       |
+----+-------------+-------+--------+----------------+---------+---------+----------------------+--------+-------------+
|  1 | SIMPLE      | e     | ALL    | PRIMARY        | NULL    | NULL    | NULL                 | 300201 |             | 
|  1 | SIMPLE      | de    | ref    | PRIMARY,emp_no | PRIMARY | 4       | employees.e.emp_no   |      1 | Using index | 
|  1 | SIMPLE      | d     | eq_ref | PRIMARY        | PRIMARY | 4       | employees.de.dept_no |      1 | Using index | 
+----+-------------+-------+--------+----------------+---------+---------+----------------------+--------+-------------+
3 rows in set (0.00 sec)

Now, this isn’t the worst example, but still it does include a totally unnecessary table, departments, which we can get rid of and still get the needed number. Basically it’s a useless table in this query, and should be removed. It’s simply a result of someone not checking what she/he really needed for the query to work.

To wrap up this first part on refactoring, remember to push WHERE clauses as far up as you can to eliminate fetching unneeded rows and then later removing them. Also remember how indexes work and do not include useless tables in a join—identify what you want the query to accomplish, and how you can do that without touching more rows than needed. And to really top it off, use transactions and don’t commit after every insert/update unless you really need to. Grouping them together may very well increase performance.

In Part 2, we will cover example programs, loops/cursors, procedures, and how to optimize those without having to change the application behaviour. Stand by.

email
Want to talk with an expert? Schedule a call with our team to get the conversation started.

3 Comments. Leave new

Hi, just to check: if I rewrote WHERE category=’Electronics’ AND manufacturer=’Apple’) to WHERE manufacturer=’Apple’ AND category=’Electronics’) and redid the index as (manufacturer, category), it’d work with WHERE manufacturer=’Apple’ too?

Is there a way to automagically find these kind of stuff from the logs?

Reply
Nicklas Westerlund
January 16, 2009 9:04 am

@Dado, yes, it would, as in the index (manufacturer,category) doing a search on WHERE manufacturer=’Apple’ would form a prefix of the index, and it could be utilized. (however, not WHERE category=’Electronics’)

Depends on what you want to find, but in theory you should be able to catch queries not using indexes (although there may be valid reasons for not doing so) by using log-queries-not-using-indexes

Reply
Log Buffer #131: a Carnival of the Vanities for DBAs
January 16, 2009 12:22 pm

[…] « Refactoring MySQL Applications: Part 1 […]

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *