I like the concept of open source databases where you can see what the database is doing and why. It helps you figure out unexpected behaviors and when you can read the code, you can really admire the skill and cleverness that goes into building a database. Unfortunately, Oracle is not open-source, but I occasionally dabble in MySQL.
One of my favorite customers had a problem. They had to load around 20G of data into a table on MySQL database. The data loaded fine, but when he tried to build few indexes on the database, he got a mysterious error:
ERROR 1114 (HY000): The table ‘really_big_table’ is full. The error was mysterious because we had around 1.5T of free space on the disk. Also, if the customer created the indexes before loading the data there was no error. This gave them a work around, but one that slow and annoying. Later, I found out that we are not the first to run into this mystery.
We had two MySQL DBAs (real ones, not me) look into the issue, and they found few MySQL bugs that could have caused the problem and the customer said he may try patching or upgrading the server. I was supposed to discuss the results with the customer and close the issue. But I had some spare time and the customer had some spare hours and we decided that I can also take a look at the issue before closing it.
The first thing I noticed is that the bug did not reproduce on the database that the customer gave me for testing. No matter how much data I loaded and how big and random I made the indexed columns, the indexes were always created with no errors. The original DB that had the problem was long gone, but our DBAs saved all the configuration parameters and OS parameters so I could compare the differences. They seemed identical to me. Except the issue reproduced in one but not in the other.
As everyone who ever worked with a support organization knows – “issue doesn’t reproduce” is an excellent excuse for closing a ticket. I could have closed the issue at that point, but I still had some spare time and I decided to try another approach before closing the ticket.
I downloaded the MySQL source code for subversion and used grep to search for the mysterious table. The idea is that if I’ll see what throws the error, I’ll be able in what theoretical conditions it can occur and this can help me reproduce it.
Reading the code, I discovered that “The table ‘%-.192s’ is full” is the English text for error ER_RECORD_FILE_FULL. This error is generated by sql/handler.cc when the storage engine throws HA_ERR_RECORD_FILE_FULL. HA_ERR_RECORD_FILE_FULL is a MySQL translation by InnoDB for two errors: DB_OUT_OF_FILE_SPACE and DB_TOO_MANY_CONCURRENT_TRXS. The second is relevant only before 5.1, so we are only looking at DB_OUT_OF_FILE_SPACE.
DB_OUT_OF_FILE_SPACE is returned at about 40 different places in InnoDB code.
At this point I re-checked the error message and saw that there were more details in the attached error log:
100309 23:10:06 InnoDB: Error: Write to file (merge) failed at offset 0 1570766848.
InnoDB: 1048576 bytes should have been written, only 856064 were written.
InnoDB: Operating system error number 0.
Comparing the error with the source code revealed that “N bytes should have been written, only K were written” happens when pwrite() didn’t write everything it should have to disk. Error code 0 actually means that pwrite() was successfull, there is no error from the OS. I believe that in this case the correct behavior from MySQL would be to try writing the rest of the message. But in any case, this definitely indicates an OS issue and therefore this is not exactly a MySQL bug.
Grepping the source code for “Error: Write to file .* failed at offset” led to the discovery that in most cases, where my error says (merge) the error contains an actual file name. “merge” turned out to be the code name for the temporary files created during merge sort, an action taken while creating an index. The TABLE FULL error message is very misleading – it is actually “Out of space writing to temporary file”.
I tried to re-create the index and look for the temporary files but I didn’t see them anywhere. More code reading revealed that the files are created in “tmpdir”, if “tmpdir” is not defined in the configuration file, /tmp will be used instead.
Why didn’t I see the files? Due to really nifty trick from MySQL developers: They create a file, keep the handle and immediately delete it. The OS will not really delete the file as long as the handle is there, but no one will be able to see it. If the process holding the handle crashes, the file will be removed by the OS, which means that MySQL doesn’t need to take care of cleanup. Really cute, but it means that “ls” and “du” will return misleading results and disk space monitoring will not report that you are running out of disk space.
Now I know why the issue wasn’t reproduced – in my system the tmpdir parameter pointed to a directory with around 20G free. On the old server with the issue tmpdir was not defined and /tmp only has 600M free. I knew that because the MySQL DBAs helpfully recorded the old configuration file.
This sounded like a really good explanation to me and I was extremely proud of my detective work. I went and explain all this to the customer’s MySQL DBA . The DBA smiled and patiently explained that any “real” MySQL DBA knows that you should check for enough space in temp directory before building indexes and that he got the same errors even after moving the temp directory to a larger partition.
I’m obviously not a real MySQL DBA yet, and I learned things the long way around, which sometimes happen when you are new to the technology. I still think that reading the source code is not the worst way to learn about new technology and it allows you to notice some cool details that are not always obvious in the documentation. Ever since, I always check where “tmpdir” is pointing before building large indexes, and this failure never reproduced.