How Linux Hugepages work to improve Oracle performance

Posted in: Technical Track

This post is not intended to be an in-depth discussion of how Linux Hugepages work, but rather just a cursory explanation.

In the course of discussing this topic with many DBAs I have found that while many DBAs may know how to configure Hugepages, they often do not know how it works.

Knowing how Hugepages work is key to understanding the benefits of using them.

Please note this discussion is for static Hugepages only. RedHat Linux 6 (and derived distros) introduced dynamic or transparent Hugepages.

Oracle strongly recommends disabling Transparent Hugepages, and in fact they are disabled by default in Oracle distributions of Linux 6 and 7.

See Oracle Support Note ALERT: Disable Transparent HugePages on SLES11, RHEL6, RHEL7, OL6, OL7, and UEK2 and above (Doc ID 1557478.1)

Linux Memory Allocation

Linux manages memory in a standard pagesize, with the pagesize usually set at 4k.

$ getconf PAGESIZE
4096

When and Oracle instance starts up, a request is made for all of the shared memory for the SGA.

For this demonstration let’s assume the SGA size is a rather modest 16 GiB.

When the SGA memory is made available to Oracle via standard memory allocation, it is done in chunks of pagesize as seen earlier with getconf PAGESIZE.

How many chunks of memory is that?

When the pagesize is 4k

( 16 * 1G ) / ( 4k )
or
( 16 * 2**30 ) / ( 4 * 2**10 ) = 4194304

That is 4,194,304 chunks of memory to be managed for our rather modest SGA.

Managing that many discrete chunks of memory adds significant processing overhead.

What can be done to reduce that overhead?

If the size of the pages is increased, the number of memory chunks can be reduced, and thereby reducing the overhead required to manage the memory.

Let’s consider what happens when we use hugepages instead of the standard pagesize.

First look at the Hugepages info:

 grep Huge /proc/meminfo
HugePages_Total:     800
HugePages_Free:        2
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

The pagesize for Hugepages is here set to the Linux standard of 2M.

2M pagesize
( 16 * 1G ) / ( 2M )
or
( 16 * 2**30 ) / ( 2 * 2**20 ) = 8192

4194304 / 8192 = 512

With Hugepages there are 512x fewer pages to manage! Whatever time was being used to manage this memory was just reduced by a factor of 512. You can see where the benefit is.

An Extreme Demonstration

Following is a little bash script to illustrate the difference between 4K and 2M pagesizes.

This script will output the address for each page of memory for each page allocated for our 16G SGA.

Commenting out the second definition of PAGESIZE will cause the script to run for 4k pagesize rather than 2M pagesize.

#!/bin/bash

# first show segments for 64G SGA in 4k chunks
# then in 2M chunks
SGA_SIZE=16

(( PAGESIZE = 4 * (2**10) ))
(( PAGESIZE = 2 * (2**20) ))

for sgaGig in $(seq 1 $SGA_SIZE)
do

   (( chunk=2**30 )) # 1 Gig

   while [[ $chunk -gt 0 ]]
   do
      (( chunk -= PAGESIZE ))
      printf "0x%08x\n" $chunk
   done

done

Here is an example of the output for 2M pagesize:

$ ./sga-page-count.sh | head -10
0x3fe00000
0x3fc00000
0x3fa00000
0x3f800000
0x3f600000
0x3f400000
0x3f200000
0x3f000000
0x3ee00000
0x3ec00000

The full output is rather boring and will not be shown.

It does become interesting though when you see how much time is required to run the script for each pagesize.

2M pagesize

$ time ./sga-page-count.sh | wc -l
8192

real    0m0.069s
user    0m0.064s
sys     0m0.000s

4K pagesize

$ time ./sga-page-count.sh | wc -l
4194304

real    0m39.687s
user    0m29.268s
sys     0m8.136s

The script required 574 times longer to run with the 4K pagesize than with the 2M pagesize.

39.6 / .069 = 573.9

This is a bit of an oversimplification as we are just counting the memory pages, not actually managing memory.

However this simple explanation and demonstration makes it easy to understand the performance benefit of Hugepages.

Note: as a bonus, Hugepages are also not subject to swapping, which is another benefit.

email
Want to talk with an expert? Schedule a call with our team to get the conversation started.

About the Author

Oracle experience: started with Oracle 7.0.13 Programming Experience: Perl, PL/SQL, Shell, SQL Also some other odds and ends that are no longer useful Systems: Networking, Storage, OS to varying degrees. Have fond memories of DG/UX

8 Comments. Leave new

Bhavani P Dhulipalla
January 23, 2018 11:15 pm

Great article..

is there any performance benifit we see in the database once the SGA is already allocated ?

Reply

Once Hugepages are the configured, the database instance must be restarted to make use of them.

Reply

Why hugepages is not subject to swapping?

Reply

Good question. If you google for it you will find a lot of articles saying that hugepages are subject to swapping.

That did not seem quite right to me, as my understanding was that hugepages were excluded from swapping.

The reason for the confusion is likely due to the more recent advent of ‘transparent’ or ‘anonymous’ hugepages. These are dynamic and can be swapped.

Oracle strongly recommends disabling transparent hugepages for Oracle databases running on any Linux where they may be used, and are in fact disabled by default on Oracle Linux 6 and 7.

Please reference this Oracle Support note:
ALERT: Disable Transparent HugePages on SLES11, RHEL6, RHEL7, OL6, OL7, and UEK2 and above (Doc ID 1557478.1)

The hugepages that are used by Oracle are ‘static’ hugepages, and are not subject to swapping.

From https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt

“Pages that are used as huge pages are reserved inside the kernel and cannot
be used for other purposes. Huge pages cannot be swapped out under
memory pressure.”

Reply

lesser the pages , lesser the overhead to manage page table. but that is on OS Size .
but how does it improve Database performance is still unanswered.

Reply

Hi Rahul,

True, the memory pages are on the OS side.
The hugepages we are concerned about are being used by Oracle.

Let’s say that the TLB on a Xeon processor can store 64 TLB entries for 4k pages, and 32 entries for 2M pages.

The Ice Lake Benchmark Preview: Inside Intel’s 10nm

The addresses for frequently used memory can be found in the TLB. This saves the CPU from the need to look up the addresses.

In this case, let’s just discuss the buffer cache.

With a 4k pagesize, the most memory that can be covered is 256k. That really is not very much.

If the data a query is searching for is all in the cache, and is 2M in size, the CPU must do lookups for at least 448 pages of memory.

The number may be larger, and probably will be, as the data may be distribute in something more than the ideal number of 256 blocks, assuming a database block size of 8k.

Now, what if instead of 4k pages, all of our data is stored in 2M pages due to using HugePages?

Rather than 256k of memory being covered by the 4k pagesize TLB, the 2M pagesize TLB covers 64M.

All of our data may possibly be found by simple lookup in the TLB. (assuming heavily accessed memory)

Here is a great blog that shows the cost of a TLB cache miss.

Cache Miss, TLB Miss, Page Fault

Reply
Mark J Bobak
April 13, 2023 4:27 pm

Hi Jared,

One other benefit of HugePages, that you idn’t mention.
Each process that maps the SGA. i.e., server backgrounf processes, with standard pages, need thier own private copy of the pagetble. With HugePages, the pages table is *shared*, so the larger the SGA and the larger the number of concurrent connections, more more memory you save, just in memory overhead for the pagetable. For large SGAs and many concnrrent users, the effect can be huge.

-Mark

PS. It’s been *way* too long sinve my blog was actively updated, but I did a writeup there several years ago.

Reply

Good point, thanks Mark.

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *