Oracle service secrets: more performance with less load-balancing

Posted in: Oracle, Technical Track

A huge but often overlooked aspect of tuning cluster databases is workload management. That means controlling on which and how many nodes a RAC database should actively work. The default service and most intuitive configuration is to spread the workload evenly over all nodes. If all nodes accept the same number of connections, that usually results in an even distribution of connections and therefor workload. However, this often leads to systems that end up performing worse than a single instance would. I will explain why towards the end of this blog post after a demonstration.

Lab Setup – Oracle Cloud RAC and swingbench

The quickest way for me to demo performance on a cluster database is to use Oracle’s RAC DBaaS to configure and spin up a two node cluster. I used the smallest machines with only 4 OCPU cores per node. The version I am using is 12.1 with a single PDB configured. I could have used the service that is created by default for each service but since I cannot modify the properties of that, let’s create one for our tests:

srvctl add service -db orcl -service sb -pdb PDB1 -preferred orcl1,orcl2

To generate some load I am using swingbench‘s simple stress test. I simply downloaded and unzipped the software and modified the Stress_Test.xml config file with my connection details and 12 users. Everything else was left at the default, resulting in a workload of 15% Inserts, 40% Selects, 30% Updates and 10% Deletes. (I know those numbers do not add up to 100% but they are the defaults…)

benchmark results for with load-balancing

So let’s fire up the benchmark and see what happens. As this runs for a minute or two, the numbers levels off at just around or slightly over 5000 TPS.

[[email protected] bin]$ ./charbench -v -c ../configs/Stress_Test.xml 
Author  :	 Dominic Giles
Version :	 2.6.0.1046

Results will be written to results.xml.
Hit Return to Terminate Run...

Time		Users	TPM	TPS

9:20:05 AM	0	0	0
9:20:15 AM	12	5820	3210
9:20:25 AM	12	50202	4497
9:20:35 AM	12	94894	4635
9:20:45 AM	12	143614	4900
9:20:55 AM	12	194960	4664
9:21:05 AM	12	243025	4968
9:21:15 AM	12	287858	4994
9:21:25 AM	12	294665	5076
9:21:35 AM	12	298551	4489
9:21:45 AM	12	298826	5225
9:21:55 AM	12	296454	5027
9:22:05 AM	12	298158	4934
9:22:15 AM	12	296693	5149

While this is running we can confirm that the 12 user sessions are evenly balanced over both nodes:

SQL> select inst_id, count(*) from gv$session where username = 'SB' group by inst_id;

   INST_ID   COUNT(*)
---------- ----------
	 1	    6
	 2	    6

A quick check with snapper shows that all of these sessions are indeed active and also shows CLUSTER as the main wait class:

SQL> @snapper ash=WAIT_CLASS 10 1 [email protected]*
Sampling SID [email protected]* with interval 10 seconds, taking 1 snapshots...

-- Session Snapper v4.25 - by Tanel Poder ( https://blog.tanelpoder.com/snapper ) 


------------------------------------
  ActSes   %Thread | WAIT_CLASS
------------------------------------
    4.25    (425%) | Cluster
    2.73    (273%) | Commit
    1.74    (174%) | ON CPU
     .15     (15%) | Other
     .04      (4%) | Concurrency
     .01      (1%) | Configuration

--  End of ASH snap 1, end=2017-12-06 09:22:35, seconds=10, samples_taken=80, AAS=8.9

To summarize this test run: The benchmark levelled off at about 5000 TPS, the 12 user sessions were evenly spread across the two nodes and the main wait class was CLUSTER.

Re-run the same test on singleton service

Now let’s change the configuration of the service to only be active on one instance. It would still fail over to the other node if that primary node crashed but all new connection requests will get redirected only to the instance running in orcl1. After the change, verify that only one instance is active for this service in the active listener config. See the difference between the SB and the PDB1 or “test” services below:

[[email protected] ~]$ srvctl modify service -db orcl -service sb -modifyconfig -preferred orcl1 -available orcl2
[[email protected] ~]$ 
[[email protected] ~]$ lsnrctl status

LSNRCTL for Linux: Version 12.1.0.2.0 - Production on 06-DEC-2017 09:35:09

Copyright (c) 1991, 2014, Oracle.  All rights reserved.

Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS of the LISTENER
------------------------
Alias                     LISTENER_SCAN1
Version                   TNSLSNR for Linux: Version 12.1.0.2.0 - Production
Start Date                06-DEC-2017 04:12:23
Uptime                    0 days 5 hr. 22 min. 46 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/12.1.0.2/grid/network/admin/listener.ora
Listener Log File         /u01/app/grid/diag/tns/cl1/listener_scan1/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN1)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.31.42.42)(PORT=1521)))
Services Summary...
Service "orcl.a42.oraclecloud.internal" has 2 instance(s).
  Instance "orcl1", status READY, has 2 handler(s) for this service...
  Instance "orcl2", status READY, has 1 handler(s) for this service...
Service "pdb1.a42.oraclecloud.internal" has 2 instance(s).
  Instance "orcl1", status READY, has 2 handler(s) for this service...
  Instance "orcl2", status READY, has 1 handler(s) for this service...
Service "sb.a42.oraclecloud.internal" has 1 instance(s).
  Instance "orcl1", status READY, has 2 handler(s) for this service...
Service "test.a42.oraclecloud.internal" has 2 instance(s).
  Instance "orcl1", status READY, has 2 handler(s) for this service...
  Instance "orcl2", status READY, has 1 handler(s) for this service...
The command completed successfully

Now we are ready to re-run the benchmark. Let’s first verify that indeed all connections are made to only one node:

SQL> select inst_id, count(*) from gv$session where username = 'SB' group by inst_id;

   INST_ID   COUNT(*)
---------- ----------
	 1	   12

And the result is that this time, even though only one of the two nodes is used for the benchmark, the overall throughput is about 25% higher while one of the nodes completely idle. And if the numbers are not enough, there is also a visualization of the TPS throughput.

[[email protected] bin]$ ./charbench -v -c ../configs/Stress_Test.xml 
Author  :	 Dominic Giles
Version :	 2.6.0.1046

Results will be written to results.xml.
Hit Return to Terminate Run...

Time		Users	TPM	TPS

9:40:50 AM	0	0	0
9:41:00 AM	12	15536	4794
9:41:10 AM	12	74938	6282
9:41:20 AM	12	139588	6220
9:41:30 AM	12	205721	6278
9:41:40 AM	12	268106	6332
9:41:50 AM	12	332876	6477
9:42:00 AM	12	378663	6481
9:42:10 AM	12	383823	6642
9:42:20 AM	12	383594	6257
9:42:30 AM	12	380948	6390
9:42:40 AM	12	383238	6923
9:42:50 AM	12	383988	6509
9:43:00 AM	12	387262	6522
9:43:10 AM	12	384656	6612

Visualization of benchmark test results

Snapper confirms that all Cluster related waits have disappeared and the throughput now only seems to be limited by how fast we can commit.

SQL> @snapper ash=WAIT_CLASS 10 1 [email protected]*
Sampling SID [email protected]* with interval 10 seconds, taking 1 snapshots...

-- Session Snapper v4.25 - by Tanel Poder ( https://blog.tanelpoder.com/snapper )


------------------------------------
  ActSes   %Thread | WAIT_CLASS
------------------------------------
    5.20    (520%) | Commit
    1.87    (187%) | ON CPU
     .22     (22%) | Other
     .06      (6%) | Concurrency
     .01      (1%) | Network

--  End of ASH snap 1, end=2017-12-06 09:42:25, seconds=10, samples_taken=79, AAS=7.4

GC or Cluster waits from same-block updates

I showed how running the same workload can be more efficient if only one instead of many nodes is used. But why is that? The answer lies in how cache fusion works with dirty blocks. While RAC is great at fetching clean (unmodified) blocks from another instance over the interconnect, dirty blocks that have had DML applied are a different story. There is actually a lot that the sending instance has to do. It creates a copy of the dirty block called a Past Image (PI) and has to write the associated redo to disk before sending the block over the interconnect. That requires significant CPU and IO resources and does not depend on the latency or bandwidth of the interconnect.

The benchmark is configured to perform a significant amount of DML on a rather small table of 64MB and 100k rows with 12 sessions performing the same operations concurrently. As a result chances are high that one session will update a row in the same block that another session just modified. And with the evenly distributed, default load-balancing behaviour that means those dirty blocks get bounced back and forth between the instances.

Conclusion

Workloads that happen to update the same datablocks from sessions on different nodes can suffer from this “RAC tax” and you can see that in GC or Cluster related waits. One way to avoid this is to use a singleton service that directs all new connections for a given application to only one node at a time. Hence my general recommendation is to create a new service for each application or type of workload. Then you can observe if this becomes a problem and modify the service to a singleton.
This has been one of the more valuable tricks in my toolbox and a discussion I have had with many architects and developers who think that more is always better.

Other posts in this series

email
Want to talk with an expert? Schedule a call with our team to get the conversation started.

About the Author

Björn Rost is an Oracle ACE Director, and one of Pythian’s top Oracle experts. A popular presenter, he travels the world attending technology conferences, sharing insights, and learning with his wide network of peers. Björn also serves as president of IOUG’s RAC special interest group. He is always challenging himself, personally through physical activities including triathlons, and professionally through his ongoing quest to increase his knowledge of Oracle and other leading technologies.

1 Comment. Leave new

Very Interesting ! I had issue in the past due to fact that the cluster service was using load balancing, we had to switch from cardinality 2 to 1, and I thought that some configurations were not done very well but this confirms everything that the dirty blocks being highly active across instances is the root cause of the cluster waits.
Many Thanks!

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *