Disk I/O is frequently the performance bottleneck with relational databases. With AWS recently releasing 4,000 PIOPs EBS volumes, I wanted to do some benchmarking with pgbench and PostgreSQL 9.2. Prior to this release the maximum available I/O capacity was 2,000 IOPs per volume. EBS IOPs are read and written in 16Kb chunks with their performance limited by both the I/O capacity of the EBS volumes and the network bandwidth between an EC2 instance and the EBS network. My goal isn’t to provide a PostgreSQL tuning guide, an EC2 tuning guide, or a database deathmatch complete with graphs; I’ll just be displaying what kind of performance is available out-of-the-box without substantive tuning. In other words, this is an exploratory benchmark not a comparative benchmark. I would have liked to compare the performance of 4,000 PIOPs EBS volumes with 2,000 PIOPs EBS volumes, but I ran out of time so that will have to wait for a following post.
I conducted my testing in AWS’ São Paulo region. One benefit of testing in sa-east-1 is that spot prices for larger instances are (anecdotally) more stable than in us-east. Unfortunately, sa-east-1 doesn’t have any cluster compute (CC) instances available. CC instances have twice the bandwidth to the EBS network than non-CC EC2 instances. That additional bandwidth allows you to construct larger software RAID volumes. My cocktail napkin calculations show that it should be possible to reach 50,000 PIOPs on an EBS-backed CC instance without much of a problem.
I tested with three EC2 instances: an m1.large from which to run pgbench, an m2.2xlarge with four EBS volumes, and an m1.xlarge with one EBS volume. All EBS volumes are 400GB with 4,000 provisioned IOPs. The m1.large instance was an on-demand instance; the other instances — the pgbench target database servers — were all spot instances with a maximum bid of $0.05. (In one case our first spot instance was terminated, and we had to rebuild it). Some brief testing showed that having an external machine driving the benchmark was critical for the best results.
All EC2 instances are running Ubuntu 12.10. A custom sysctl.conf tuned the Sys V shared memory as well as set swappiness to zero and memory overcommit to two.
kernel.shmmax = 13355443200 kernel.shmall = 13355443200 vm.swappiness = 0 vm.overcommit_memory = 2
The following packages were installed via apt-get:
In order to install the postgresql packages a pgdb.list file containing
deb https://apt.postgresql.org/pub/repos/apt/ squeeze-pgdg main
was placed in /etc/apt/sources.list.d and the following commands were run:
gpg --keyserver pgp.mit.edu --recv-keys ACCC4CF8 gpg --armor --export ACCC4CF8 | apt-key add -
RAID and Filesystems
For the one volume instance, I simply created an XFS file system and mounted it on /mnt/benchmark.
mkdir /mnt/benchmark mkfs.xfs /dev/svdf mount -t xfs /dev/svdf /mnt/benchmark echo "/dev/svdf /mnt/benchmark xfs defaults 1 2" >> /etc/fstab
For the four volume instance it was only slightly more involved. mkfs.xfs analyzes the underlying disk objects and determines the appropriate values for stride and width. Below are the commands for assembling a four volume mdadm software RAID array that is mounted on boot (assuming you’ve attached the EBS volumes to your EC2 instance). Running dpkg-reconfigure rebuilds the initrd image.
mkdir /mnt/benchmark mdadm --create /dev/md0 --level=0 --raid-volumes=4 /dev/svdf /dev/svdg /dev/svdh /dev/svdi mdadm --detail --scan >> /etc/mdadm/mdadm.conf mkfs.xfs /dev/md0 echo "/dev/md0 /mnt/benchmark xfs defaults 1 2" >> /etc/fstab dpkg-reconfigure mdadm
pgbench is a utlity included in the postgresql-contrib-9.2 package. It approximates the TPC-B benchmark and can be looked at as a database stress test whose output is measured in transactions per second. It involves a significant amount of disk I/O with transactions that run for relatively short amounts of time. vacuumdb was run before each pgbench iteration. For each database server pgbench was run mimicking 16 clients, 32 clients, 48 clients, 64 clients, 80 clients, and 96 clients. At each of those client values, pgbench iterated ten times in steps of 100 from 100 to 1,000 transactions per client. It’s important to realize that pgbench’s stress test is not typical of a web application workload; most consumer facing web applications could achieve much higher rates than those mentioned here. The only pgbench results against AWS/EBS volumes that I’m-aware-of/is-quickly-googleable is from early 2012 and, at its best, achieves rates 50% slower than the lowest rates found here. I drove the benchmark using a very small, very unfancy bash script. An example of the pgbench commandline would be:
pgbench -h $DBHOST -j4 -r -Mextended -n -c48 -t600 -U$DBUSER
m1.xlarge with single 4,000 PIOPs volume
The maximum transaction volume for this isntance was when running below 48 concurrent clients and under 500 transactions per client. While the transaction throuput never dropped precipitously at any point, loads outside of that range exhibited varying performance. Even at its worst, though, this instance handled between 600-700 transactions/second.
m2.2xlarge with four 4,000 PIOPs volumes
I was impressed; at no point did the benchmark stress this instance — the tps rate was between 1700-1900 in most situations with peaks up to 2200 transactions per second. If I was asked to blindly size a “big” PostgreSQL database server running on AWS this is probably where I would start. It’s not so large that you have operational issues like worrying about MTBFs for ten volume RAID arrays or trying to snapshot 4TB of disk space, but it is large enough to absorb a substantial amount of traffic.
Graphs and Tabular Data
The spread of transactions/second irrespective of number of clients.
Data grouped by number of concurrent clients with each bar representing an increase in 100 transactions per second ranging from 100 to 1,000.
Progression of tps by individual level of concurrency. The x-axis tick marks measure single pgbench runs from 100 transactions per client to 1,000 transactions per client.
Raw tabular data
Again, a box plot of the data with a y-axis of transactions/second.
Grouped by number of concurrent clients between 100 and 1,000 transactions per client.
TPS by number of concurrent clients. The x-axis ticks mark pgbench runs progressing from 100 transactions per client to 1,000 transactions per client.
Tabular data m2.2xlarge with four 4,000 PIOPs EBS volumes
What is the disk iops graph when doing pgbench?