The MySQL Toolkit can be found at https://mysqltoolkit.sourceforge.net/. It is coded and maintained by Baron Schwartz (www.xaprb.com). I’ve been using the archiver tool he wrote lately, and wanted to share this tool. In every web environment I’ve worked in, there is data that is collected for analysis and that grows quite rapidly. User activity logs in particular can quickly grow out of control, and generally have no place in a front-end database after a certain amount of time. I’ll go into approaches to deciding what and how to archive in another post, but often this tool can prove very useful in moving data between databases or simply removing it.
I actually use the archiver in two different client sites. In one situation I have a significant influx of logs representing user activity. As with most data, only a limited window of time is required for analysis in the front-end. So we use archiver to move the older data to a separate datawarehouse. In another environment, we’ve wrapped it into a larger program for redistributing data between shards in a large cluster.
It is an excellent piece of code. I have not run into a bug as of yet, and it is highly configurable in regards to commit frequencies, transaction sizes, retries etc… Additionally, performance is a consideration, scanning the indexes forward only to create efficiency while working with large datasets. As the documentation states:
“The strategy is to find the first row(s), then scan some index forward-only to find more rows efficiently. Each subsequent query should not scan the entire table; it should seek into the index, then scan until it finds more archivable rows. .”
I have been an Oracle DBA for longer than I’d care to think about, and the world of open source is relatively new to me. I am consistently impressed with the open-source community that creates amazingly useful tools such as this. It is a refreshing change from the world of Oracle where everything is in the thousands or more of dollars. If you can support this kind of work through contributions of time or money, I’d strongly suggest you do so.
No comments