Measuring the potential overhead of PMM Client on MySQL workloads

Posted in: MySQL, Open Source, Technical Track

Having good historial metrics monitoring in place is critical for properly operating, maintaining and troubleshooting database systems, and Percona Monitoring and Management is one of the options we recommend to our clients for this.

One common concern among potential users is how using this may impact their database’s performance. As I could not find any conclusive information about this, I set out to do some basic tests and this post shows my results.

To begin, let me describe my setup. I used the following Google Cloud instances:

  • One 4 vCPU instance for the MySQL server
  • One 2 vCPU instance for the sysbench client
  • One 1 vCPU instance for the PMM server

I used Percona Server 5.7 and PMM 1.5.3 installed via Docker. Slow query log was enabled with long_query_time set to 0 for all tests.

I ran sysbench with 200k, 1M and 10M rows using the legacy oltp script with the pareto and special distributions, and up to 64 client threads. After several runs and reviews of the results, I settled on 1M rows with pareto for extended tests, as other combinations showed minor variations on the results from this one.

I am well aware a synthetic workload is not representative but I think the results are still useful, though I would love to measure this on a real life workload (do let me know in the comments if you have done this already).

In a nutshell, I found some impact in performance (measured as throughput in transactions per second) when running sysbench with the PMM exporters which in my case was eliminated when I configured them to serve their metrics by HTTP instead of HTTPS.

The following graph shows box plots for throughput for pmm enabled or disabled, for a different number of threads, with and without ssl:

We can see that with SSL enabled there is a noticeable drop in throughput when the exporters are running, while this is not the case when SSL is disabled.

I arrived at the conclusion that it was worth repeating the tests with SSL disabled after creating Flame Graphs from perf captures during sample runs. On them, the only significant increases were due to the exporters (mysqld_exporter and node_exporter, the qan exporter did not have any noticeable impact during my tests). The results from the tests show that this analysis pointed me in the right direction so while they are worth of separate blog posts, it is worth to at least recommend our readers to get familiar with this performance analysis tool.

Next is a scatter plot of throughput over time with ssl enabled:

On it we get a more clear picture of the impact of having the exporters running during the test.

Next is the same graphs but with SSL disabled:

Now it is much more difficult to differentiate the runs.

This is confirmed if we look at the 99 percentile of throughput for each case (here for 32 threads):

PMM SSL tps (p99)
enabled enabled 1167.518
enabled disabled 1397.397
disabled disabled 1429.097

Conclusion

PMM is a very good Open Source option for monitoring but as every instrumentation and monitoring layer you add to your stack, it won’t come for free. My very simple tests show that its impact may be significant under some scenarios, yet if it’s bad enough it may be mitigated by using HTTP instead of HTTPS for the exporters. Given the events that are unfolding in IT security as I type this, it may seem reckless to recommend disabling SSL as an “optimization”, but I think good engineering is all about informed tradeoffs and if you’re running this on a secure private network, how risky is it to expose monitoring metrics over HTTP instead of HTTPS? I would love to read answers to this question in the comments!

Finally, I think a similar cost is probably paid for the TLS layer on the pmm-server end. It would be very interesting to see an experiment like this repeated but on a different scenario: one pmm-server with several monitored clients.

email
Want to talk with an expert? Schedule a call with our team to get the conversation started.

About the Author

Fernando Ipar is a MySQL Principal Consultant who works remotely from his family home in Montevideo, Uruguay. Initially, Fernando began using GNU/Linux in 1998 and MySQL in 2000, and his journey led him to become active in the technical community, contributing and at times leading open source projects as well as founding the Montevideo MySQL Meetup. Before coming to Pythian, he spent seven years at Percona, where he contributed to the Percona Toolkit and TokuMX Toolkit, among other endeavors. Fernando helped scale and troubleshoot the back-ends for businesses of all sizes including financial institutions, telcos, and other IT companies while also supporting their full stack needs.

2 Comments. Leave new

Nice test.
I have observed PMM-clients can occupy almost 10% of CPU time, that’s why i don’t use PMM yet.

Reply

Have tried to disable table stats?

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *