CPU Utilization is Not a Useful Metric

Posted in: MySQL, Oracle, Technical Track

Once upon a time CPU utilization was quite a useful metric. Following are the output of several tools that provide CPU utilization metrics:

top

top reports a load of 1.66.

Is this correct? No. The correct load number is probably closer to 2.4.

# top -b -n 1| head -20
top - 11:27:45 up 151 days,  1:55,  7 users,  load average: 1.66, 1.84, 1.88
Tasks: 389 total,   3 running, 386 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.7%us, 20.6%sy,  1.2%ni, 77.3%id,  0.1%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:  32639636k total, 32206476k used,   433160k free,   235732k buffers
Swap: 16359420k total, 10285664k used,  6073756k free,  2354840k cached
 
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
16702 root      20   0 8274m 5.0g 5.0g S 85.1 16.1  59164:55 VirtualBox
 4657 root      20   0  9.8g 5.2g 5.1g S 45.5 16.6  26518:13 VirtualBox
 6239 root      20   0  9.8g 5.1g 5.1g S 39.6 16.5  31200:52 VirtualBox
27070 root      20   0 7954m 5.4g 5.4g S 17.8 17.5  17049:30 VirtualBox
27693 root      20   0 2233m 441m  20m S  5.9  1.4   3407:34 firefox
 7648 root      20   0 6758m 4.1g 4.1g S  4.0 13.2  17069:52 VirtualBox
 6633 root      20   0  368m  63m  31m R  2.0  0.2   1338:58 Xorg
14727 root      20   0 15216 1344  828 R  2.0  0.0   0:00.01 top
    1 root      20   0 19416  932  720 S  0.0  0.0   0:00.90 init
    2 root      20   0     0    0    0 S  0.0  0.0   0:03.53 kthreadd
    3 root      20   0     0    0    0 S  0.0  0.0   2:08.23 ksoftirqd/0
    5 root       0 -20     0    0    0 S  0.0  0.0   0:00.00 kworker/0:0H
    7 root       0 -20     0    0    0 S  0.0  0.0   0:00.00 kworker/u:0H

sar

sar does not show the load average, but does report what it thinks is CPU utilization.

Is it correct? Again, no. Actual idle should be closer 45-50%.

# sar 1 1
Linux 3.8.13-16.2.1.el6uek.x86_64 (myserver.jks.com)         01/22/2018      _x86_64_        (8 CPU)
 
11:29:32 AM     CPU     %user     %nice   %system   %iowait    %steal     %idle
11:29:33 AM     all      0.88      1.00     17.27      0.00      0.00     80.85
Average:        all      0.88      1.00     17.27      0.00      0.00     80.85

mpstat

mpstat reports per CPU.

Again, these values are not quite correct.

# mpstat -P ALL
Linux 3.8.13-16.2.1.el6uek.x86_64 (myserver.jks.com)         01/22/2018      _x86_64_        (8 CPU)
 
11:35:49 AM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
11:35:49 AM  all    0.74    1.19   20.58    0.11    0.00    0.06    0.00    0.00   77.32
11:35:49 AM    0    1.11    1.18   20.24    0.58    0.00    0.48    0.00    0.00   76.42
11:35:49 AM    1    0.88    1.32   22.45    0.08    0.00    0.02    0.00    0.00   75.25
11:35:49 AM    2    0.84    1.34   22.78    0.06    0.00    0.01    0.00    0.00   74.98
11:35:49 AM    3    0.81    1.31   21.69    0.05    0.00    0.00    0.00    0.00   76.15
11:35:49 AM    4    0.64    1.00   16.76    0.05    0.00    0.00    0.00    0.00   81.54
11:35:49 AM    5    0.57    1.11   19.28    0.02    0.00    0.00    0.00    0.00   79.02
11:35:49 AM    6    0.57    1.10   19.46    0.02    0.00    0.00    0.00    0.00   78.85

Finally the venerable uptime command:

uptime

# uptime
 11:29:48 up 151 days,  1:57,  7 users,  load average: 1.70, 1.81, 1.87

Notice that mpstat and sar both report 8 CPUs, and that is the crux of the problem.

Why is that a problem? It is a problem because this machine does not have 8 CPUs; it has only 4.

The CPU is an Intel i7-4790S with hyperthreading enabled. When hyperthreading is enabled, Linux utilities believe that the number of CPUs is actually twice the number actually present.

In this case it appears to top, sar, mpstat and uptime that there are 8 CPUs, when in reality there are only 4.

What is Hyperthreading?

“But wait; doesn’t hyperthreading double the processing power of my CPU?” you may ask.

Well, no, it doesn’t.

Hyperthreading is a clever bit of technology from Intel that allows the operating system to better take advantage of a CPU during what would otherwise be idle time. Please refer to the references list if you would like more detail.

There are many sources that estimate the performance advantage of enabling hypertreading vs not enabling it.

A good summary of the rules of thumb of expected performance benefits when hyperthreads are enabled:

Socket Count Max Benefit %
1 30%
2 15%
3+ testing required

When the previously noted utilities are reporting there are 8 CPUs, that is not quite correct then as enabling hyperthreading does not double the number of CPUs.

Given the example i7 processor, the best we can hope for is that this single socket 4 core CPU will provide the equivalent work of approximately 5.6 cores.

8 * ( ( 100 – 30 ) / 100) = 5.6

estimated CPU / reported CPUs = metric adjustment %

In this case:

5.6 / 8 = 0.7

When CPU utilization is reported as 80% idle, the real value is more like 56%

80 * 0.70 = 56

Load averages can be treated the same way:

The load of 1.66 is actually ~2.4

1.66 / .7 = 2.37

Is hyperthreading enabled?

So by now you probably would like to know how to determine if hyperthreading is enabled.

There are a couple things you need to know to investigate this.

First find out the info about the CPU in question. The following instructions are for Linux.

Start by determining the CPU model. Here is one easy method to find it:

# grep CPU /proc/cpuinfo
model name      : Intel(R) Core(TM) i7-4790S CPU @ 3.20GHz
model name      : Intel(R) Core(TM) i7-4790S CPU @ 3.20GHz
model name      : Intel(R) Core(TM) i7-4790S CPU @ 3.20GHz
model name      : Intel(R) Core(TM) i7-4790S CPU @ 3.20GHz
model name      : Intel(R) Core(TM) i7-4790S CPU @ 3.20GHz
model name      : Intel(R) Core(TM) i7-4790S CPU @ 3.20GHz
model name      : Intel(R) Core(TM) i7-4790S CPU @ 3.20GHz
model name      : Intel(R) Core(TM) i7-4790S CPU @ 3.20GHz

The next step is to point your browser at https://ark.intel.com/, and then search for the exact CPU model.

Searching for i7-4790S shows there are 4 cores and 8 threads available, so this CPU is capable of hyperthreading.

The next step is to determine if hyperthreads are enabled. Doing so is less straightforward than previous steps.

The following process can be used to determine the actual number of physical cores, and then compare that to the number of cores presented to the OS.

number of physical cores

There are 4 cores in this case

# grep 'core id' /proc/cpuinfo  | sort -u
core id         : 0
core id         : 1
core id         : 2
core id         : 3

number of processors

8 are shown

# grep 'processor' /proc/cpuinfo  | sort -u
processor       : 0
processor       : 1
processor       : 2
processor       : 3
processor       : 4
processor       : 5
processor       : 6
processor       : 7

The number of reported processors are double the number of physical cores, indicating that hyperthreads are enabled.

This was tested on another server as well, one with 4 sockets of 10 cores each and hyperthreading known to be enabled.

As there are only 40 physical cores enabled it is clear that hyperthreading is enabled.

$ grep 'core id' /proc/cpuinfo  | sort -u| wc -l
10
 
$ grep 'processor' /proc/cpuinfo  | sort -u| wc -l
80
 

So, what now?

The time for using CPU utilization as a metric to drive for performance improvements is now long past.

CPU technology has advanced so much in the past several years that this metric now has limited usefulness.

Load Averages and CPU utilization may still be useful as barometers on systems where it is known that exceeding a certain threshold indicates there may be some issues to look at.

Other than that though, these metrics have outlived their usufullnes if the goal is to drive performance improvement through monitoring and mitigation of key metrics.

For much more detailed information, please refer the the Reference section at this end of this blog.

References

Will Hyper-Threading Improve Processing Performance?
CPU Utilization is Wrong
Utilization is Virtually Useless as a Metric!
Linux Load Averages: Solving the Mystery

email

Interested in working with Jared? Schedule a tech call.

About the Author

Oracle experience: started with Oracle 7.0.13 Programming Experience: Perl, PL/SQL, Shell, SQL Also some other odds and ends that are no longer useful Systems: Networking, Storage, OS to varying degrees. Have fond memories of DG/UX

6 Comments. Leave new

Freek D'Hooge
January 23, 2018 6:39 am

I found that lscpu is also a useful utility to show your cpu details
Following is the output from my laptop:

dhoogfr@dhoogfr-lpt1 ~ $ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 58
Model name: Intel(R) Core(TM) i7-3720QM CPU @ 2.60GHz
Stepping: 9
CPU MHz: 3390.124
CPU max MHz: 3600,0000
CPU min MHz: 1200,0000
BogoMIPS: 5182.98
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 6144K
NUMA node0 CPU(s): 0-7
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm epb tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts

But be careful with virtual guests as they can present cpu threads as cores (eg Xen does this)

Reply

great info. i think you should shoot for 60 or 90 days of blogging. :)

thanks for the content.

Reply

Hi Chris,
Yeah, that would be great, but I think it will end sooner than that. :)

Reply

Would’ve been nice to see evidence how sar top et al actually calculates cpu usage. The formully seems too general to apply in all these tools.

Reply

Good post.

I hope you’ll allow me to point out that CPU threads attempt to optimize what happens during a stall. Not “idle time” as the article states. A thread switch happens on a stall and a stall is inherently not the result of being in the idle loop but rather an active opcode that needs to leave the core.

Reply

Thank you for your insight Kevin.
As the goal is to have some idea of how much processing power is actually available to Oracle when HyperThreading is enabled, what would you suggest to better understand it, as when the OS tools report the number of cores the actual CPU power available to Oracle is somewhere between #cores and 2#cores.

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *