In the previous article, I described my observations of RC Enqueue. Now it is time to take a look at the RC latches.
Latches, being serialization devices, are scalability inhibitors. Not that they inherently prevent you from scaling, quite the opposite is true. Serialization is a must if you expect your system to produce anything apart from GIGO (Garbage In Garbage Out). Concurrency is essentially made possible through serialization of shared resources. That being said, I would expect Result Cache to beat Oracle’s buffer cache on read-only workloads, since that is what RC was designed for. That is, Result Cache should perform faster and scale better.
There are two new latches related to Result Cache:
SQL> select * from v$latchname where name like 'Result Cache%'; LATCH# NAME HASH ------ ------------------------- ---------- 373 Result Cache: Latch 1545889529 374 Result Cache: SO Latch 986859868
Every allocation in Result Cache results in one Result Cache: Latch
get per allocated block, plus some “overhead”, usually between four and six gets:
SQL> create table t (n number, v varchar2(2000)); Table created. SQL> insert into t 2 select level, rpad('x', 2000, 'x') 3 from dual 4 connect by level <= 100; 100 rows created. SQL> commit; Commit complete. SQL> select free, count(*) cnt 2 from v$result_cache_memory 3 group by free; no rows selected SQL> select gets 2 from v$latch 3 where name='Result Cache: Latch'; GETS ---------- 0 SQL> begin 2 for cur in (select /*+ result_cache */ * from t) 3 loop 4 null; 5 end loop; 6 end; 7 / PL/SQL procedure successfully completed. SQL> select free, count(*) cnt 2 from v$result_cache_memory 3 group by free; FRE CNT --- ---------- NO 200 YES 24 SQL> select gets 2 from v$latch 3 where name='Result Cache: Latch'; GETS ---------- 204
Every access to Result Cache incurs two Result Cache: Latch
gets:
SQL> begin 2 for cur in (select /*+ result_cache */ * from t) 3 loop 4 null; 5 end loop; 6 end; 7 / PL/SQL procedure successfully completed. SQL> select gets 2 from v$latch 3 where name='Result Cache: Latch'; GETS ---------- 206
Now for a bit of history. When Oracle 7 was released, it contained that new thing called Shared Pool. Since Shared Pool was protected by only one latch, that resulted in significant latch contention. Subsequent versions improved this — we now do have multiple library cache latches.
How many latches do we have for Result Cache?
SQL> select count(*) 2 from v$latch_children 3 where name='Result Cache: Latch'; COUNT(*) ---------- 0
We only have one parent latch without any child latches, which sets off alarms in my head. There is still hope, though. Starting with 9i, Oracle had the ability to acquire latches in a shared mode (cache buffers chains latch
is an example). So, if Result Cache: Latch
can be obtained in a shared mode, that shouldn’t cause a contention between readers. Writers are a different story, but I can live with that as long as read scalability is there.
Let’s do a simple test with four processes trying to read from Result Cache at the same time:
SQL> create table job_times 2 ( 3 sid number, 4 time_ela number 5 ); Table created. SQL> create or replace procedure do_work( 2 p_iterations in number 3 ) is 4 l_rowid rowid; 5 begin 6 insert into job_times 7 values (sys_context('userenv', 'sid'), dbms_utility.get_time) 8 returning rowid into l_rowid; 9 10 for i in 1 .. p_iterations 11 loop 12 for cur in (select /*+ result_cache */ * from t) 13 loop 14 null; 15 end loop; 16 end loop; 17 18 update job_times set 19 time_ela=dbms_utility.get_time-time_ela 20 where rowid=l_rowid; 21 end; 22 / Procedure created.
Now it’s time to fire things up:
SQL> select gets, misses, sleeps, wait_time 2 from v$latch 3 where name = 'Result Cache: Latch'; GETS MISSES SLEEPS WAIT_TIME ---------- ---------- ---------- ---------- 206 0 0 0 SQL> select * from job_times; no rows selected SQL> declare 2 l_job number; 3 begin 4 for i in 1 .. 4 5 loop 6 dbms_job.submit( 7 job => l_job, 8 what => 'do_work(100000);' 9 ); 10 end loop; 11 end; 12 / PL/SQL procedure successfully completed. SQL> commit; Commit complete. --allow jobs to complete SQL> select case grouping(sid) when 1 then 'Total:' else to_char(sid) end sid, sum(time_ela) ela 2 from job_times 3 group by rollup((sid, time_ela)); SID ELA ------ ---------- 133 13052 135 13246 136 13016 138 13126 Total: 52440 SQL> select gets, misses, sleeps, wait_time 2 from v$latch 3 where name = 'Result Cache: Latch'; GETS MISSES SLEEPS WAIT_TIME ---------- ---------- ---------- ---------- 800210 636 20 1035979
This is sort of a bummer. I’m running this on a dual-core CPU box, and while there is nothing bad about a miss rate of 0.079%, we got a clear indicator that some requests resulted not only in misses but even in some sleeps. In terms of time, Result Cache: Latch
cost us 0.19%. This leaves me curious about how (and how badly) this might perform on something bigger than my dual core box under extremely heavy loads. There is only one latch, after all.
Now let’s see whether taking a single row from Result Cache is more efficient than from the buffer cache. Note that I’m not going to give Result Cache any artificial benefits, like comparing the time to sort and grouping a couple thousand rows from a buffer cache to get a single row, with simply getting that single row from Result Cache. The reason for this is simple. If this is really important to you, there is no sense in doing sorting and aggregation work over and over again when you can have a materialized view that stores exactly the same resulting data as Result Cache does.
I have also decided to do something more interesting than just an FTS. I’ll check how well Result Cache handles lookups, since single-row lookups can easily put significant load on the latching mechanism (remember that every Result Cache access requires two Result Cache: Latch
gets). Of course, in a real system, you would try to avoid putting stress on a serialization mechanism as much as you can (by applying good programming techniques and algorithms), but here I’m purposely doing the opposite. Now on to the test.
The most efficient way to do a lookup in a buffer cache is to have a table in a properly-sized hash cluster. In this test, Result Cache won’t be up against a pushover.
SQL> create cluster hc 2 ( 3 n number(*,0) 4 ) single table 5 hashkeys 15000 6 size 230; Cluster created. SQL> create table hc_t 2 ( 3 n number(*,0), 4 v varchar2(200) 5 ) cluster hc (n); Table created. SQL> insert into hc_t 2 select level, dbms_random.string('p', 200) 3 from dual 4 connect by level <= 10000; 10000 rows created. SQL> commit; Commit complete.
All we need now is two procedures, one with a regular select and another with a cached select:
SQL> create or replace procedure do_hc( 2 p_iterations in number 3 ) is 4 l_rowid rowid; 5 l_n number; 6 begin 7 insert into job_times 8 values (sys_context('userenv', 'sid'), dbms_utility.get_time) 9 returning rowid into l_rowid; 10 11 for i in 1 .. p_iterations 12 loop 13 l_n:=trunc(dbms_random.value(1, 10000)); 14 for cur in (select * from hc_t where n=l_n) 15 loop 16 null; 17 end loop; 18 end loop; 19 20 update job_times set 21 time_ela=dbms_utility.get_time-time_ela 22 where rowid=l_rowid; 23 end; 24 / Procedure created. SQL> create or replace procedure do_rc( 2 p_iterations in number 3 ) is 4 l_rowid rowid; 5 l_n number; 6 begin 7 insert into job_times 8 values (sys_context('userenv', 'sid'), dbms_utility.get_time) 9 returning rowid into l_rowid; 10 11 for i in 1 .. p_iterations 12 loop 13 l_n:=trunc(dbms_random.value(1, 10000)); 14 for cur in (select /*+ result_cache */ * from hc_t where n=l_n) 15 loop 16 null; 17 end loop; 18 end loop; 19 20 update job_times set 21 time_ela=dbms_utility.get_time-time_ela 22 where rowid=l_rowid; 23 end; 24 / Procedure created.
The hash cluster will go first:
SQL> delete from job_times; 4 rows deleted. SQL> commit; Commit complete. SQL> declare 2 l_job number; 3 begin 4 for i in 1 .. 4 5 loop 6 dbms_job.submit( 7 job => l_job, 8 what => 'do_hc(100000);' 9 ); 10 end loop; 11 end; 12 / PL/SQL procedure successfully completed. SQL> commit; Commit complete. --allow jobs to complete SQL> select case grouping(sid) when 1 then 'Total:' else to_char(sid) end sid, sum(time_ela) ela 2 from job_times 3 group by rollup((sid, time_ela)); SID ELA ------ ---------- 131 1141 135 1052 137 943 138 1115 Total: 4251
Now let’s see if Result Cache can beat those numbers:
SQL> delete from job_times; 4 rows deleted. SQL> commit; Commit complete. SQL> select gets, misses, sleeps, wait_time 2 from v$latch 3 where name = 'Result Cache: Latch'; GETS MISSES SLEEPS WAIT_TIME ---------- ---------- ---------- ---------- 800210 636 20 1035979 SQL> declare 2 l_job number; 3 begin 4 for i in 1 .. 4 5 loop 6 dbms_job.submit( 7 job => l_job, 8 what => 'do_rc(100000);' 9 ); 10 end loop; 11 end; 12 / PL/SQL procedure successfully completed. SQL> commit; Commit complete. --allow jobs to complete SQL> select case grouping(sid) when 1 then 'Total:' else to_char(sid) end sid, sum(time_ela) ela 2 from job_times 3 group by rollup((sid, time_ela)); SID ELA ------ ---------- 130 1116 131 1126 133 1011 135 1062 Total: 4315 SQL> select gets, misses, sleeps, wait_time 2 from v$latch 3 where name = 'Result Cache: Latch'; GETS MISSES SLEEPS WAIT_TIME ---------- ---------- ---------- ---------- 1620214 8621 510 4315106
Apparently, the results are pretty much comparable. But the number of misses and sleeps went up. Given that we spent more than 3 seconds waiting for Result Cache: Latch
, that could be an easy win for Result Cache but… a single latch can be like that.
Are you curious what’s up with cache buffers chains latches
after all those tests?
SQL> select gets, misses, sleeps, wait_time 2 from v$latch 3 where name = 'cache buffers chains'; GETS MISSES SLEEPS WAIT_TIME ---------- ---------- ---------- ---------- 858313 75 0 0
Nothing (almost).
Whether history is repeating itself and we need to await another release for multiple Result Cache latches, is an open question. Shared mode gets is yet another open question. Time will tell.
5 Comments. Leave new
[…] https://www.pythian.com/blogs/598/oracle-11g-query-result-cache-rc-latches […]
Once you get a hammer, everything to you looks like a nail! ;)
Thank you Alex, this was very helpful. Looks like it may be fixed in R2:
SQL> select * from job_times;
SID TIME_ELA
———- ———-
31 15936
23 15677
34 15362
36 15576
Elapsed: 00:00:00.03
SQL> select name, gets, misses, sleeps, wait_time
2 from v$latch
3 where name like ‘Result Cache%’;
NAME GETS MISSES SLEEPS WAIT_TIME
———————————– ———- ———- ———- ———-
Result Cache: RC Latch 801875 0 0 0
Result Cache: SO Latch 11 0 0 0
Result Cache: MB Latch 0 0 0 0
Best regards.
Dave,
the situation has drastically improved in 11GR2 but we’re still not there.
You can reference https://afatkulin.blogspot.com/2010/06/11gr2-result-cache-scalability.html for more details.
Hi,
I am not sure how people are able to read these blogs. Could you please let me know if we need to do some color coding in order to read these blogs? I couldn’t see the test cases properly. White background and very very light black foreground. I need to copy/paste the whole page into notepad and read it. Is there something I am missing here?