Oracle RAC: Network Performance with Jumbo Frames

Posted in: Oracle, Technical Track

Introduction

When working with Oracle RAC, it’s strongly advised to use Jumbo Frames for the network that provides the interconnect between nodes.

As the nodes tend to send a lot of blocks back and forth across this private network, the larger the size of the packets, the fewer of them there are to send.

The usual block size for an Oracle database is 8192 bytes.

The standard MTU (Maximum Transmission Unit) for IP frame size is 1500 bytes.

Sending an 8k Oracle block requires assembling six chunks of data to create a frame or packet.

However, if Jumbo Frames are used (9000 bytes), the entire block fits neatly into a single frame or packet.

—–

Note: Viewing the mechanical effects of MTU in action requires a fair amount of effort to setup a SPAN or Port Mirror. Use that port to capture the traffic from the wire. This is not being done for this test.

Why this explanation?  Because, as shown below, the packet size will be ~8k, even though the MTU is set to 1500.  Because we cannot see the effects of MTU directly on the client or server, these effects are inferred from other data.

“Frame” and “packet” are terms that seem to be used interchangeably. However, they are context-sensitive. That is, they occupy different layers of the OSI model.

—–

On with the story…

Recently, I was working with a two Node Oracle RAC system that runs in a VMWare ESXi 6.5 environment.

It was thought that due to the optimizations being performed by VMWare in the virtual network stack, that Jumbo Frames were unnecessary.

However, that does not seem to be the case.

After some testing of throughput using both the standard 1500 byte MTU and 9000 byte Jumbo Frame MTU, the larger MTU size resulted in a 22% increase in throughput speed.

Why did that happen? Well, keep reading to find out.

The Test Lab

Though the VMWare testing was done on Oracle Linux 7.8, the following experiments are being performed on Ubuntu 20.04 LTS.

As there was no need to run Oracle, Ubuntu works just fine for these tests.

Following are the two servers created:

ubuntu-test-mule-01: tcp test client - 192.168.1.226
ubuntu-test-mule-02: tcp test server - 192.168.1.227

Some version information:

root@ubuntu-mule-02:~/perl-sockets/packet-test# grep -E '^(NAME=|VERSION=)' /etc/os-release
NAME="Ubuntu"
VERSION="20.04 LTS (Focal Fossa)"

Network Configuration

Other than having different IP addresses, ubuntu-test-mule-01 and ubuntu-test-mule-02 are set up exactly the same way.

Because this version of Ubuntu uses netplan to configure the interfaces, we modified the /etc/netplan/00-installer-config.yaml file to configure the two test interfaces.

The interfaces used for the testing are enp0s8 and enp0s9.

Then, netplan apply was used to enable the changes.

root@ubuntu-mule-01:~/perl-sockets/packet-test# cat /etc/netplan/00-installer-config.yaml
# This is the network config written by 'subiquity'
network:
  ethernets:
    enp0s3:
      dhcp4: true
    enp0s8:
      dhcp4: false
      addresses: [192.168.154.4/24]
      mtu: 9000
    enp0s9:
      dhcp4: false
      addresses: [192.168.199.35/24]
  version: 2

The results:

# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 08:00:27:b8:5c:dc brd ff:ff:ff:ff:ff:ff
inet 192.168.1.227/24 brd 192.168.1.255 scope global enp0s3
valid_lft forever preferred_lft forever
inet6 fe80::a00:27ff:feb8:5cdc/64 scope link
valid_lft forever preferred_lft forever
3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc fq_codel state UP group default qlen 1000
link/ether 08:00:27:ab:d5:44 brd ff:ff:ff:ff:ff:ff
inet 192.168.154.5/24 brd 192.168.154.255 scope global enp0s8
valid_lft forever preferred_lft forever
inet6 fe80::a00:27ff:feab:d544/64 scope link
valid_lft forever preferred_lft forever
4: enp0s9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 08:00:27:22:fc:a2 brd ff:ff:ff:ff:ff:ff
inet 192.168.199.36/24 brd 192.168.199.255 scope global enp0s9
valid_lft forever preferred_lft forever
inet6 fe80::a00:27ff:fe22:fca2/64 scope link
valid_lft forever preferred_lft forever

Configure the TCP Test

Some time ago, I put together some Perl scripts for network throughput testing.

There were several reasons for this, including:

  • You can copy and paste the code if necessary.
  • It is easy to modify for different tests.

The following was run on each of the test mule servers:

# git clone https://github.com/jkstill/perl-sockets.git

On the ubuntu-mule-02 Server:

The following changes were made to server.pl:

First, disable the dot reporting. By default a “.” is printed every 256 packets that are received. You can disable this using the following line:

my $displayReport = 0; # set to 0 to disable reporting

Now, set the listening addresses.

Default code:

my $sock = IO::Socket::INET->new(
   #LocalAddr => '192.168.1.255', # uncomment and edit adddress if needed
   LocalPort => $port,
   Proto => $proto,
   Listen => 1,
   Reuse => 1
) or die "Cannot create socket: $@";

We changed this to reflect the network interfaces that would be used for the testing on the server-side.

my $sock = IO::Socket::INET->new(
   LocalAddr => '192.168.154.5', # MTU 9000
   #LocalAddr => '192.168.199.36', # MTU 1500
   LocalPort => $port,
   Proto => $proto,
   Listen => 1,
   Reuse => 1
) or die "Cannot create socket: $@";

The appropriate interface was used for each test.

On the ubuntu-mule-01 Client:

Some test data was created. The use of /dev/urandom and gzip makes it unlikely that you can perform any compression. This is something I learned to do for quick throughput tests using ssh, as ssh compresses data. It’s probably not necessary in this case, but then again, it doesn’t hurt to ensure the test data is non-compressible.

root # cd perl-sockets/packet-test
root # dd if=/dev/urandom bs=1048576 count=101 | gzip - | dd iflag=fullblock bs=1048576 count=100 of=testdata-100M.dat
root # dd if=/dev/urandom bs=1048576 count=1025 | gzip - | dd iflag=fullblock bs=1048576 count=1024 of=testdata-1G.dat

root@ubuntu-mule-01:~/perl-sockets/packet-test# ls -l testdata-1*
-rw-r--r-- 1 root root 104857600 May 11 16:05 testdata-100M.dat
-rw-r--r-- 1 root root 1073741824 May 11 16:04 testdata-1G.dat

The Driver Script

We used the packet-driver.sh script to run each of the tests from the client-side.

This script simply runs a throughput test 23 times in succession, using the specified MTU size.

#!/usr/bin/env bash

: ${1:?Call with 'packet-driver.sh <SIZE> '!}
: ${mtu:=$1}

if ( echo $mtu | grep -vE '1500|9000' ); then
   echo Please use 1500 or 9000
   exit 1
fi

declare -A localHosts
declare -A remoteHosts

localHosts[9000]=192.168.154.4
localHosts[1500]=192.168.199.35

remoteHosts[9000]=192.168.154.5
remoteHosts[1500]=192.168.199.36

blocksize=8192
testfile=testdata-1G.dat

cmd="./client.pl --remote-host ${remoteHosts[$mtu]} --local-host ${localHosts[$mtu]} --file $testfile --buffer-size $blocksize"

for i in {0..22}
do
   echo "executing: $cmd"
   $cmd
done

Perform the Tests

For each test, we enabled the correct interface in server.pl, and then the started the server.

For the client-side, we called the packet-driver.sh script with the required MTU size.

The MTU size passed on the command line determines which interface is used on the client-side.

1500 MTU

On the server-side, make sure the address in server.pl is set for the 1500 byte MTU interface. Then, start the server:

root@ubuntu-mule-02:~/perl-sockets/packet-test# grep -E '\s+LocalAddr' server.pl
LocalAddr => '192.168.199.36', # MTU 1500

root@ubuntu-mule-02:~/perl-sockets/packet-test# ./server.pl | tee mtu-1500.log
Initial Receive Buffer is 425984 bytes
Server is now listening ...
Initial Buffer size set to: 2048

On the client-side, run packet-driver.sh:

root@ubuntu-mule-01:~/perl-sockets/packet-test# ./packet-driver.sh 1500
executing: ./client.pl --remote-host 192.168.199.36 --local-host 192.168.199.35 --file testdata-1G.dat --buffer-size 8192

remote host: 192.168.199.36
port: 4242
bufsz: 8192
simulated latency: 0

bufsz: 8192
Send Buffer is 425984 bytes
Connected to 192.168.199.36 on port 4242
Sending data...

9000 MTU

On the server-side, make sure the address in server.pl is set for the 9000 byte MTU interface. Then, start the server:

root@ubuntu-mule-02:~/perl-sockets/packet-test# grep -E '\s+LocalAddr' server.pl
LocalAddr => '192.168.154.5', # MTU 9000

root@ubuntu-mule-02:~/perl-sockets/packet-test# ./server.pl | tee mtu-9000.log
Initial Receive Buffer is 425984 bytes
Server is now listening ...
Initial Buffer size set to: 2048

Now, run packet-driver.sh on the client:

root@ubuntu-mule-01:~/perl-sockets/packet-test# ./packet-driver.sh 9000
executing: ./client.pl --remote-host 192.168.154.5 --local-host 192.168.154.4 --file testdata-1G.dat --buffer-size 8192

remote host: 192.168.154.5
port: 4242
bufsz: 8192
simulated latency: 0

bufsz: 8192
Send Buffer is 425984 bytes
Connected to 192.168.154.5 on port 4242
Sending data...

Reporting

When all tests are complete, use packet-averages.pl to calculate the averages across all tests per MTU size.

root@ubuntu-mule-02:~/perl-sockets/packet-test# ./packet-averages.pl < mtu-1500.log
key/avg: Bytes Received 1073733637.000000
key/avg: Avg Packet Size 7898.147391
key/avg: Packets Received 135948.304348
key/avg: Average milliseconds 0.043824
key/avg: Avg Megabytes/Second 172.000000
key/avg: Avg milliseconds/MiB 5.818500
key/avg: Total Elapsed Seconds 6.850447
key/avg: Network Elapsed Seconds 5.958098

root@ubuntu-mule-02:~/perl-sockets/packet-test# ./packet-averages.pl < mtu-9000.log
key/avg: Bytes Received 1073733637.000000
key/avg: Avg Packet Size 7519.793478
key/avg: Packets Received 142790.217391
key/avg: Average milliseconds 0.033652
key/avg: Avg Megabytes/Second 213.165217
key/avg: Avg milliseconds/MiB 4.692753
key/avg: Total Elapsed Seconds 5.495095
key/avg: Network Elapsed Seconds 4.805343

The average Total Elapsed Seconds for the 9000 MTU tests is only 80% of the time required for the 1500 MTU tests.

From these results, it appears as if using Jumbo Frames is a pretty clear winner, even in a virtualized environment.

This result might seem somewhat surprising, as the tests are not sending any data over a physical wire.

In this case, the “network” is only composed of VirtualBox host network adapters.

So, why then are Jumbo Frames still so much faster than the standard 1500 MTU size?

Performance Profiling

This time, we’ll run a single test for each MTU size.

We’ll use the perf profiler to gather process profile information on the client-side.

First, the 1500 MTU size:

perf record --output perf-mtu-1500.data ./client.pl --remote-host 192.168.199.36 --local-host 192.168.199.35 --file testdata-1G.dat --buffer-size 8192

Now the 9000 MTU size:

perf record --output perf-mtu-9000.data ./client.pl --remote-host 192.168.154.5 --local-host 192.168.154.4 --file testdata-1G.dat --buffer-size 8192

Let’s see some reports from perf.

1500 MTU

root@ubuntu-mule-01:~/perl-sockets/packet-test# perf report -i perf-mtu-1500.data --stats | grep TOTAL
TOTAL events: 28648

root@ubuntu-mule-01:~/perl-sockets/packet-test# perf report -i perf-mtu-1500.data | head -20
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 28K of event 'cpu-clock:pppH'
# Event count (approx.): 7137500000
#
# Overhead Command Shared Object Symbol
# ........ ....... .................. ....................................
#
56.35% perl [kernel.kallsyms] [k] e1000_xmit_frame
21.00% perl [kernel.kallsyms] [k] e1000_alloc_rx_buffers
9.71% perl [kernel.kallsyms] [k] e1000_clean
2.22% perl [kernel.kallsyms] [k] __softirqentry_text_start
1.74% perl [kernel.kallsyms] [k] __lock_text_start
0.61% perl [kernel.kallsyms] [k] copy_user_generic_string
0.58% perl [kernel.kallsyms] [k] clear_page_rep
0.39% perl libpthread-2.31.so [.] __libc_read
0.30% perl [kernel.kallsyms] [k] do_syscall_64

9000 MTU

root@ubuntu-mule-01:~/perl-sockets/packet-test# perf report -i perf-mtu-9000.data --stats | grep TOTAL
TOTAL events: 25259
root@ubuntu-mule-01:~/perl-sockets/packet-test#
root@ubuntu-mule-01:~/perl-sockets/packet-test# perf report -i perf-mtu-9000.data | head -20
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 25K of event 'cpu-clock:pppH'
# Event count (approx.): 6290500000
#
# Overhead Command Shared Object Symbol
# ........ ....... .................. ........................................
#
64.15% perl [kernel.kallsyms] [k] e1000_xmit_frame
16.92% perl [kernel.kallsyms] [k] e1000_alloc_jumbo_rx_buffers
9.03% perl [kernel.kallsyms] [k] e1000_clean
1.79% perl [kernel.kallsyms] [k] __softirqentry_text_start
1.66% perl [kernel.kallsyms] [k] __lock_text_start
0.41% perl [kernel.kallsyms] [k] clear_page_rep
0.39% perl [kernel.kallsyms] [k] copy_user_generic_string
0.26% perl [kernel.kallsyms] [k] do_syscall_64
0.24% perl libpthread-2.31.so [.] __libc_read

A conclusion we might draw from these reports:

When using an MTU of 9000, the test program spent more time sending data (e1000_xmit_frame) and less time in overhead.

Note that 16.92% of the time was spent in allocating Jumbo-sized frames through e1000_alloc_jumbo_rx_buffers in the MBU 9000 test, versus 21% of the time required in the MTU 1500 test for e1000_alloc_rx_buffers.

The reason for the performance increase, in this case, seems to be this: The use of Jumbo Frames simply requires less work for the server.

Rather than assembling six 1500 byte frames into a packet to accommodate our 8192-byte packet, Jumbo Frames can get it all in one frame.

Though these tests were run using servers virtualized with VirtualBox, the results are quite similar to those seen in servers running in VMWare ESXi.

The fact that the servers are virtual does not reduce the need to ensure that RAC nodes get the fastest possible throughput on the private network used for the interconnect… And that means using Jumbo Frames.

email

Authors

Interested in working with Jared? Schedule a tech call.

About the Author

Oracle experience: started with Oracle 7.0.13 Programming Experience: Perl, PL/SQL, Shell, SQL Also some other odds and ends that are no longer useful Systems: Networking, Storage, OS to varying degrees. Have fond memories of DG/UX

2 Comments. Leave new

Grzegorz Gorysz
May 17, 2020 11:58 am

Hi Jared,
thank You for this informative blog post.
Could You explain whats the reason for this syntax :
: ${1:?Call with ‘packet-driver.sh ‘!}
: ${mtu:=$1}

? Never saw that before.
Regards.
Greg

Reply
Jared Still
May 18, 2020 10:27 am

Hi Grzegorz,

Thanks for your question.

The ‘: ${1:?Call with ‘packet-driver.sh ‘!}’ causes the script to exit with an error if no parameter is supplied on the command line.

The second line could have easily just been ‘mtu=$1’.

This is Parameter Expansion.

You can see more at https://tldp.org/LDP/abs/html/refcards.html#AEN22728

Also in ‘man bash’ search for ‘^\s+Parameter Expansion’

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *