This is an issue that keeps rearing its ugly head over and over again, and since it greatly affects performance, it is most important that DBAs of any DMBS running on Linux come to grips with it. So I decided to do some research and try different settings on my notebook. Here are my findings.
What can you find on the web?
A Wikipedia search for the word swappiness will come up empty (any volunteers out there want to write an article?). A Google search will show some pretty old material—the best article I found is from 2004: Linux: Tuning Swappiness. This article includes a detailed discussion with some interesting remarks by Andrew Morton, a Linux kernel maintainer.
So, what is swappiness?
Towards the end of the email thread quoted in the article, you’ll find this definition (sort of):
> I’ve read the source for where swappiness comes into play. Yet I
> cannot make a statement about what it means. Can you?
It controls the level of page reclaim distress at which we decide to start reclaiming mapped pages.
We prefer to reclaim pagecache, but we have to start swapping at *some* level of reclaim failure. swappiness sets that level, in rather vague units.
“Rather vague units” . . . So, for practical purposes, it’s easier to define it by its effects:
- Set swappiness to 0 and the kernel is going to start swapping only when absolutely needed.
- Set swappiness to 100 and the kernel will preemtivelyswap out memory you are not using.
To set it use the following command with sudo (it’s not a good practice to manipulate files under
/proc directly as suggested in the article):
sudo sysctl -w vm.swappiness=0
For my workstation, the applications I usually have open take slightly more than the physical 2GB installed in my notebook. I also keep jumping from one window to the next. Setting a high swappiness value means I will have to wait for the applications to reclaim memory pages with each jump, creating a noticeable delay while changing windows.
On the other hand, if swappiness is set to
0, swapping will occur when I start editing a big document or navigate to a big web page. So, while I will be able to jump from window to window pretty quickly, I’ll experience delays when using one of the memory-hungry applications intensively. Try yourself values of
100 and a few values in between to see which best fits your particular use profile and understand how each one of them will affect swapping.
How can I objectively measure swapping?
In the previous section, determining the best swappiness value is based entirely on personal perception. If swappiness is too low, you’ll notice your application might take a while to reclaim memory from a heavy loaded system. If swappiness is too high, the application will get the memory fast, but you’ll have to wait when switching between apps: for example going from OpenOffice to Firefox (both are memory-hungry).
For an objective measure, you can use sar, which is included in the sysstat package. sar can be used in two different ways:
- Install it in the crontab to gather samples in the background.
- Run it interactively.
Running sar from cron
Each Linux distribution behaves differently when you install the sysstat package. For Debian-based systems, installing the package doesn’t add it to the crontab. You will have to check your system’s documentation for the details. I installed it in the crontab to gather samples every ten minutes (the most frequent setting).
Here are an example of the two options I use the most to determine swappiness health:
sar -r -s 22:00:00 Linux 2.6.24-23-rt (maggie) 04/09/2009 10:05:01 PM kbmemfree kbmemused %memused kbbuffers kbcached kbswpfree kbswpused %swpused kbswpcad 10:15:01 PM 14716 1770256 99.18 5428 252664 946424 49596 4.98 1796 10:25:01 PM 14176 1770796 99.21 3292 229788 946424 49596 4.98 1796 Average: 14446 1770526 99.19 4360 241226 946424 49596 4.98 1796 sar -B -s 22:00:00 Linux 2.6.24-23-rt (maggie) 04/09/2009 10:05:01 PM pgpgin/s pgpgout/s fault/s majflt/s pgfree/s pgscank/s pgscand/s pgsteal/s %vmeff 10:15:01 PM 2.48 96.11 202.55 0.03 158.34 0.00 0.00 0.00 0.00 10:25:01 PM 27.13 70.24 249.75 0.35 169.96 0.00 0.00 0.00 0.00 10:35:01 PM 4.70 39.58 213.53 0.04 149.19 0.00 0.00 0.00 0.00 Average: 11.44 68.64 221.94 0.14 159.16 0.00 0.00 0.00 0.00
-s parameter it will show a day’s worth of statistics. For clarity, I’m only getting a few lines.
-r will show the overall memory usage, the last columns showing swapping information. With higher swappiness values you will notice greater numbers in the
-B specifically presents swapping statistics. The most significant columns here are
majflt/s, which represents the pages that had to be loaded from disk (synonymous with I/O overhead), and
%vmeff where the healthy values are
100 (every inactive page swapped out is being reused) or
0 (no pages were scanned during the interval).
To run it interactively, instead of using
-s, you can specify an interval and a number of samples:
sar -B 2 5 Linux 2.6.24-23-rt (maggie) 04/09/2009 10:44:29 PM pgpgin/s pgpgout/s fault/s majflt/s pgfree/s pgscank/s pgscand/s pgsteal/s %vmeff 10:44:31 PM 0.00 0.00 871.50 0.00 575.00 0.00 0.00 0.00 0.00 10:44:33 PM 0.00 62.00 110.00 0.00 1694.00 0.00 0.00 0.00 0.00 10:44:35 PM 0.00 0.00 481.09 0.00 179.10 0.00 0.00 0.00 0.00 10:44:37 PM 0.00 52.00 63.50 0.00 160.50 0.00 0.00 0.00 0.00 10:44:39 PM 20.00 96.00 52.00 0.00 116.00 0.00 0.00 0.00 0.00 Average: 4.00 41.96 315.78 0.00 544.56 0.00 0.00 0.00 0.00
In this case I’m taking
5 samples at
2 second intervals.
Which swappiness value should you use?
For workstations, it’s up to your particular usage profile. If you jump from application to application,
0 should be best. If you stay on the same memory-hungry application for a long time,
100 should be best.
When it comes to a database server, however, you have to keep in mind that it is a memory-craving application, and if get you have a sudden activity increase, having to wait to obtain additional memory from the system is really bad for performance. For database servers I recommend to keep in mind the following criteria:
- Use dedicated database servers and configure the most memory you can without exceeding the physical memory available. Running another memory-intensive application (e.g. an application server) on the same server as the database is a bad idea, as both will compete for memory, creating high swapping situations.
- For database servers, set swappiness to
0. Use sysctl to set it interactively as quoted above, or better still, set it in the
/etc/sysctl.conffile to persist when the server is restarted.
- Constantly monitor the memory usage and swapping efficiency with a utility like sar.
Paying attention to these items you will avoid the I/O overhead that high swapping might create.
you really should never have to touch this on 2.6, ever.
If you think you have any reason to, you are doing it wrong.
If you really do have problems with swapping, try 2.6.28 or later with split LRU.
This is an interesting suggestion, but we can’t control what organizations install on their servers. Also keep in mind that most companies won’t run a “bleeding” edge kernel.
I couldn’t help but notice in the sar output the monitoring interval being used was 10 minutes. My experience is if you’re monitoring that infrequently you’re results are mush because you’re never going to see any spikes when you need to.
Check out collectl – https://collectl.sourceforge.net/ which by default monitors just about everything sar does and more at 10 second intervals by default and typically uses about .1-.2% of the cpu. Even if you don’t use collectl lower you sar monitoring levels to the 10sec range and you’ll see a lot more than you ever did. Probably will also find out your system is doing more than you thought it was too.
The 10 min interval is mostly to have a “trend” rather than catch spikes. Notice that further down I actually use shorter intervals in the command line, which is the way I evaluated swappiness while jumping from one app to the next, or while launching a “heavy” app. As a difference with the example, I used 100 samples 1, 2, 5 or 10 seconds apart depending on the granularity I wanted.
The 10min interval in the crontab is what the sysstats package documentation suggested.
Last but not least Pythian’s monitoring scripts gather sar’s output into a database, which gives us a better way to evaluate trends.
I’ll look into collectl.
re 10 min – yes, I did notice you running sar interactively at 1 second. Makes much more sense.
I’m amazed sar still recommends 10 minutes for monitoring. I can’t help but wonder if this was done when systems were a lot slower and nobody every thought to change it (after all sar has been around a looong time) but makes absolutely no sense today. At a 10 second monitoring interval in which you don’t look at process or slabs, collectl used less than 0.1% of the cpu and it’s a perl script! I’d think sar could do better than that.
I wonder how many people make decisions based on sar output? For example if sar shows periodic network utilization of 20%, how does one know if there are really long bursts (minutes worth) of 100% spikes? Same thing for CPU, disk, etc…
But then again collectl can monitor at sub-second levels, something you might actually care about from time to time. Also has a bunch of other features sar lacks but I’ll let you be your own judge on that.
[…] #1 […]