As the amount of data companies are interested in collecting grows, life becomes all the more difficult for IT staff at all levels within an organization. SAS Enterprise storage devices that were once considered giants are now being phased out in favor of SSD Arrays with features such as de-duplication, tape storage has pretty much been abandoned and the same goes without saying for database engines.
For many customers just storing data is not enough because of the CAPEX and OPEX that is involved, smarter ways of storing the same data are required and since databases generally account for the greatest portion of storage requirements across an application stack. Lately they are used not only for storing data but also for storing logs in many cases. IT managers, developers and system administrators very often turn to the DBA and pose the time old question “is there a way we can cut down on the space the database is taking up?” and this question seems to be asked all the more frequently as time goes by.
This is a dilemma that cannot easily be solved for a MySQL DBA. What would the best way to resolve this issue be? Should I cut down on binary logging? Hmm… I need the binary logs in case I need to track down the transactions that have been executed and perform point in time recovery. Perhaps I should have a look at archiving data to disk and then compress this using tar and gzip? Heck if I do that I’ll have to manage and track multiple files and perform countless imports to re-generate the dataset when a report is needed from historical data. Maybe, just maybe, I should look into compressing the data files? This seems like a good idea… that way I can keep all my data, and I can just take advantage of a few extra CPU cycles to keep my data to a reasonable size – or does it?
Inspired by the time old dilemma I decided to take the latest version of TokuDB for test run and compare it to InnoDB compression which has been around a while. Both technologies promise a great reduction in disk usage and even performance benefits – naturally if data resides on a smaller portion of the disk access time and seek time will decrease, however this isn’t applicable to SSD disks that are generally used in the industry today. So I put together a test system using an HP Gen8 Proliant Server with 4x Intel® Xeon® E3 Processors, 4GB ECC RAM & the Samsung EVO SATA III SSD rated at 6G/s and installed the latest version of Ubuntu 14.04 to run some benchmarks. I used the standard innodb-heavy configuration from the support-files directory adding one change – innodb_file_per_table = ON. The reason for this is that TokuDB will not compress the shared tablespace hence this would affect the results of the benchmarks. To be objective I ran the benchmarks both on MySQL and MariaDB using 5.5.38 which is the latest bundled version for TokuDB.
The databases were benchmarked for speed and also for the space consumed by the tpcc-mysql dataset generated with 20 warehouses. So lets first have a look at how much space was needed by TokuDB vs. InnoDB (using both compressed and uncompressed tables):
|InnoDB Compressed Tables||4,2|
|InnoDB Regular Tables||4,8|
TokuDB was a clear winner here, of course the space savings depend on the type of data stored in the database however with the same dataset it seems TokuDB is in the lead. Seeing such a gain in storage requirements of course will make you wonder how much overhead is incurred in reading and writing this data, so lets have a look at the “tpm-C” to understand how many orders can be processed per minute on each. Here I have also included results for MariaDB vs. MySQL. The first graph shows the amount of orders that were processed per 10 second interval and the second graph shows the total “tpm-C” after the tests were run for 120 seconds:
Figure 1 – Orders processed @ 10 sec interval
|Interval||MariaDB 5.5.38||MariaDB 5.5.38 InnoDB Compressed||TokuDB on MariaDB 5.5.38||MySQL 5.5.38||MySQL 5.5.38 InnoDB Compressed||TokuDB on MySQL 5.5.38|
Figure 2 – “tpm-C” for 120 test run
|TokuDB on MySQL 5.5.38||32669.5|
|TokuDB on MariaDB 5.5.38||30827.5|
|MySQL 5.5.38 InnoDB Compressed Tables||4151|
|MariaDB 5.5.38 InnoDB Compressed Tables||3437|
Surprisingly enough however, the InnoDB table compression results were very low – perhaps this may have shown better results on regular SAS / SATA disks with traditional rotating disks. The impact on performance was incredibly high and the savings on disk space were marginal compared to those of TokuDB so once again again it seems we have a clear winner! TokuDB on MySQL outperformed both MySQL and MariaDB with uncompressed tables. The findings are interesting because in previous benchmarks for older versions of MariaDB and MySQL, MariaDB would generally outperform MySQL however there are many factors should be considered.
These tests were performed on Ubuntu 14.04 while the previous tests I mentioned were performed on CentOS 6.5 and also the hardware was slightly different (Corsair SSD 128GB vs. Samsung EVO 256GB). Please keep in mind these benchmarks reflect the performance on a specific configurations and there are many factors that should be considered when choosing the MySQL / MariaDB edition to use in production.
As per this benchmark, the results for TokuDB were nothing less than impressive and it will be very interesting to see the results on the newer versions of MySQL (5.6) and MariaDB (10)!