Benchmarking Postgres on AWS 4,000 PIOPs EBS instances

Pythian Marketing

May 8, 2013

Tags: Google, Technical Track, Cloud, Google Cloud Platform (Gcp)

Introduction

Disk I/O is frequently the performance bottleneck with relational databases. With AWS recently releasing 4,000 PIOPs EBS volumes, I wanted to do some benchmarking with pgbench and PostgreSQL 9.2. Prior to this release the maximum available I/O capacity was 2,000 IOPs per volume. EBS IOPs are read and written in 16Kb chunks with their performance limited by both the I/O capacity of the EBS volumes and the network bandwidth between an EC2 instance and the EBS network. My goal isn't to provide a PostgreSQL tuning guide, an EC2 tuning guide, or a database deathmatch complete with graphs; I'll just be displaying what kind of performance is available out-of-the-box without substantive tuning. In other words, this is an exploratory benchmark not a comparative benchmark. I would have liked to compare the performance of 4,000 PIOPs EBS volumes with 2,000 PIOPs EBS volumes, but I ran out of time so that will have to wait for a following post.

Setup

Region

I conducted my testing in AWS' São Paulo region. One benefit of testing in sa-east-1 is that spot prices for larger instances are (anecdotally) more stable than in us-east. Unfortunately, sa-east-1 doesn't have any cluster compute (CC) instances available. CC instances have twice the bandwidth to the EBS network than non-CC EC2 instances. That additional bandwidth allows you to construct larger software RAID volumes. My cocktail napkin calculations show that it should be possible to reach 50,000 PIOPs on an EBS-backed CC instance without much of a problem.

EC2 instances

I tested with three EC2 instances: an m1.large from which to run pgbench, an m2.2xlarge with four EBS volumes, and an m1.xlarge with one EBS volume. All EBS volumes are 400GB with 4,000 provisioned IOPs. The m1.large instance was an on-demand instance; the other instances — the pgbench target database servers — were all spot instances with a maximum bid of $0.05. (In one case our first spot instance was terminated, and we had to rebuild it). Some brief testing showed that having an external machine driving the benchmark was critical for the best results.

Operating System

All EC2 instances are running Ubuntu 12.10. A custom sysctl.conf tuned the Sys V shared memory as well as set swappiness to zero and memory overcommit to two.

kernel.shmmax = 13355443200
 kernel.shmall = 13355443200
 vm.swappiness = 0
 vm.overcommit_memory = 2

Packages The following packages were installed via apt-get:

htop
xfsprogs
debian-keyring
mdadm
postgresql-9.2
postgresql-contrib-9.2

In order to install the postgresql packages a pgdb.list file containing

deb https://apt.postgresql.org/pub/repos/apt/ squeeze-pgdg main

was placed in /etc/apt/sources.list.d and the following commands were run:

gpg --keyserver pgp.mit.edu --recv-keys ACCC4CF8
 gpg --armor --export ACCC4CF8 | apt-key add -

apt-get update

RAID and Filesystems

For the one volume instance, I simply created an XFS file system and mounted it on /mnt/benchmark.

mkdir /mnt/benchmark
 mkfs.xfs /dev/svdf 
 mount -t xfs /dev/svdf /mnt/benchmark
 echo "/dev/svdf /mnt/benchmark xfs defaults 1 2" >> /etc/fstab

For the four volume instance it was only slightly more involved. mkfs.xfs analyzes the underlying disk objects and determines the appropriate values for stride and width. Below are the commands for assembling a four volume mdadm software RAID array that is mounted on boot (assuming you've attached the EBS volumes to your EC2 instance). Running dpkg-reconfigure rebuilds the initrd image.

mkdir /mnt/benchmark
 mdadm --create /dev/md0 --level=0 --raid-volumes=4 /dev/svdf /dev/svdg /dev/svdh /dev/svdi
 mdadm --detail --scan >> /etc/mdadm/mdadm.conf
 mkfs.xfs /dev/md0
 echo "/dev/md0 /mnt/benchmark xfs defaults 1 2" >> /etc/fstab
 dpkg-reconfigure mdadm

Benchmarking

pgbench is a utlity included in the postgresql-contrib-9.2 package. It approximates the TPC-B benchmark and can be looked at as a database stress test whose output is measured in transactions per second. It involves a significant amount of disk I/O with transactions that run for relatively short amounts of time. vacuumdb was run before each pgbench iteration. For each database server pgbench was run mimicking 16 clients, 32 clients, 48 clients, 64 clients, 80 clients, and 96 clients. At each of those client values, pgbench iterated ten times in steps of 100 from 100 to 1,000 transactions per client. It's important to realize that pgbench's stress test is not typical of a web application workload; most consumer facing web applications could achieve much higher rates than those mentioned here. The only pgbench results against AWS/EBS volumes that I'm-aware-of/is-quickly-googleable is from early 2012 and, at its best, achieves rates 50% slower than the lowest rates found here. I drove the benchmark using a very small, very unfancy bash script. An example of the pgbench commandline would be:

pgbench -h $DBHOST -j4 -r -Mextended -n -c48 -t600 -U$DBUSER

m1.xlarge with single 4,000 PIOPs volume

The maximum transaction volume for this isntance was when running below 48 concurrent clients and under 500 transactions per client. While the transaction throuput never dropped precipitously at any point, loads outside of that range exhibited varying performance. Even at its worst, though, this instance handled between 600-700 transactions/second.

m2.2xlarge with four 4,000 PIOPs volumes

I was impressed; at no point did the benchmark stress this instance — the tps rate was between 1700-1900 in most situations with peaks up to 2200 transactions per second. If I was asked to blindly size a "big" PostgreSQL database server running on AWS this is probably where I would start. It's not so large that you have operational issues like worrying about MTBFs for ten volume RAID arrays or trying to snapshot 4TB of disk space, but it is large enough to absorb a substantial amount of traffic.

Graphs and Tabular Data

single-4K-volume tps

The spread of transactions/second irrespective of number of clients.

Data grouped by number of concurrent clients with each bar representing an increase in 100 transactions per second ranging from 100 to 1,000.

Progression of tps by individual level of concurrency. The x-axis tick marks measure single pgbench runs from 100 transactions per client to 1,000 transactions per client.

Raw tabular data

txns/client	100	200	300	400	500	600	700	800	900	1000
clients
16	1455	1283	1183	653	1197	533	631	1009	923	648
32	1500	1242	1232	757	747	630	1067	665	688	709
48	281	864	899	705	1029	749	736	593	766	641
64	944	1281	704	1010	739	596	778	662	820	612
80	815	893	1055	809	597	801	684	708	736	663
96	939	889	774	772	798	682	725	662	776	708

four-4,000-PIOPs-volumes tps

Again, a box plot of the data with a y-axis of transactions/second.

Grouped by number of concurrent clients between 100 and 1,000 transactions per client.

TPS by number of concurrent clients. The x-axis ticks mark pgbench runs progressing from 100 transactions per client to 1,000 transactions per client.

Tabular data m2.2xlarge with four 4,000 PIOPs EBS volumes

txns/client	100	200	300	400	500	600	700	800	900	1000
clients
16	1487	1617	1877	1415	1388	1882	1897	1771	1267	1785
32	1804	2083	2160	1791	1259	1997	2230	1501	1717	1918
48	1810	2152	1296	1951	2117	1775	1709	1803	1817	1847
64	1810	1580	1568	2056	1811	1784	1849	1909	1942	1658
80	1802	2044	1467	2142	1645	1896	1933	1740	1821	1851
96	1595	1403	2047	1731	1783	1859	1708	1896	1751	1801

Insight and analysis of technology and business strategy