Using and Benchmarking Galera in Different Architectures
What I found most interesting in this second day was, once again, synchronous replication and Replication solutions provided by Continuent.
The first session I attended was the Galera one, done Henrik and Alexey. Here is the session overview:
“We will present results from benchmarking a MySQL Galera cluster under various workloads and also compare them to how other MySQL high-availability approaches perform. We will also go through the different ways you can setup Galera. Some of its architectures are unique among MySQL clustering solutions.
- MySQL Galera
- Synchronous multi-master clustering, what does it mean?
- Load balancing and other options
- WAN replication
- How split brain is handled
- How split brain is handled in WAN replication
- How does it perform?
- In memory workload
- Scale-out for writes – how is it possible?
- Disk bound workload
- WAN replication
- Parallel slave threads
- Allowing slave to replicate (commit) out-of-order”
I know how passionate Henrik is when he talks about Galera, and I partially share the feeling. I said “partially” because I am still not fully convinced about the numbers, but I am working on that.
Anyhow, besides the part related to bench marching, I found interesting the combination of blocks and element for the HA solution. Including redundant load balancer and using MySQL JDBC with Galera is a simple but efficient way to provide HA.
Another important fact is that we can finally drop the DRBD solution that has been the only syncronous solution for MySQL for too long. DRBD was forcing us to have one PRIMARY (RW) node and a completely useless SECONDARY one.
I have also appreciated the Alexey’honesty about scaling. Galera will not scale to infinite as some foul state, but it could have a decent number of nodes. Now, its limit is obviously to discover and calibrate against the real load pushed against the nodes. It cannot be defined as an absolute abstract limit.
I was also intrigued by the way the Galera team is managing the server synchronization by “quorum”. In short, when having 3 nodes, if one is unable to access the other two but still get writes (split brain) at the moment of re-union, the other two will take over by “quorum”. The obvious and immediate problem is when we have 3 data centers, one with 6 nodes and the others with 2 nodes each. If the DC with 6 nodes gets disconnected, the valid data will be in the 2 remaining data centers. However, at the moment of reunion, the DC with 6 nodes will take the majority, and all data set will become invalid. Alexey is working on a way to calculate the “weight” by proximity to fix this issue.
Honestly, I am not sure that Galera is production-ready, but it’scertainly the easiest and most interesting solution for simple write scale.
Reference for Galera at https://codership.com/.