Day 0 (tutorials), post 2. (I’m still working on the versioning thing — bear with me.)
I snuck into the 5.1 new features session halfway through but definitely well worth it. Lots of new features coming out in 5.1 though Brian Aker admits some of them are not quite production-ready yet, so don’t go deploying it just yet.
Some of the features are:
- partitioning — range, list, hash, key, even composite. And its for all MySQL storage engines.
- faster multi-threaded loads of data instead of only having a single thread (for import / load data infile)
- events / event scheduler (like jobs/job scheduler)
- more system tables: the general log and slow query log are now system tables and you can run regular queries on them.
- the slow query log can be turned on or off on the fly
- row-based replication
- new storage engines
- more near-online operations (some alter tables, drop / create indexes but not all)
I had a good discussion with the MySQL certification team at lunch, we will go into more details when we meet Wednesday evening — I was invited to be on the certification technical advisory board earlier.
I’m now at the presentation on Wikipedia. It is interesting they think the database is a very small part of their whole “coolness.” Indeed, as we delve into more of how it all works, it does seem they have it all together very well. They use multiple caches, with the biggest being multiple squid servers that talk to each other and an application that causes purging when things change across all of the squid servers. In fact, the whole front-end caching “engine” that they have created is so efficient, that they just don’t care about the database. Database tuning is just not in their vocabulary! However, there’s still some stuff that stands out:
- mysql slaves are set up with different indexes based on what queries they’re going to run
- the load balancer sends queries to different (sets of) servers based on what the query is
- each database is only doing about 2500 queries per second, the load balancer considers a database overloaded at that point
- avoid pagination / sorting — the dreaded order by…limit 50 clause is an absolute no-no! Instead, get required data by PK (does depend on app design).
- use indexes for order by. never let it go to filesorts
- mysql uses innodb but commits are fake – not to disk (innodb_flush_log_at_trx_commit=0). if it crashes, they switch over to a slave anyway — it’s faster than trying to recover
the server – and any transactions lost are just an oops, too bad. It is a webapp after all!
- the underlying disk is RAID 0 for speed.. and if they lose a disk (and some data) – oops, lets switchover to a slave (see above)!
In other words, analyze your requirements carefully and then meet them. Don’t just use a feature because it exists or go with something because its “best practice” or “industry-standard” — they may not actually help meet your requirements.
And I’ll leave you with that thought. More tomorrow!