Oracle 11g’s Query Result Cache: Introducing RC Latches

Posted in: Technical Track

In the previous article, I described my observations of RC Enqueue. Now it is time to take a look at the RC latches.

Latches, being serialization devices, are scalability inhibitors. Not that they inherently prevent you from scaling, quite the opposite is true. Serialization is a must if you expect your system to produce anything apart from GIGO (Garbage In Garbage Out). Concurrency is essentially made possible through serialization of shared resources. That being said, I would expect Result Cache to beat Oracle’s buffer cache on read-only workloads, since that is what RC was designed for. That is, Result Cache should perform faster and scale better.

There are two new latches related to Result Cache:

SQL> select * from v$latchname where name like 'Result Cache%';

LATCH# NAME                            HASH
------ ------------------------- ----------
   373 Result Cache: Latch       1545889529
   374 Result Cache: SO Latch     986859868

Every allocation in Result Cache results in one Result Cache: Latch get per allocated block, plus some “overhead”, usually between four and six gets:

SQL> create table t (n number, v varchar2(2000));

Table created.

SQL> insert into t
  2   select level, rpad('x', 2000, 'x')
  3    from dual
  4    connect by level <= 100;

100 rows created.

SQL> commit;

Commit complete.

SQL> select free, count(*) cnt
  2   from v$result_cache_memory
  3   group by free;

no rows selected

SQL> select gets
  2   from v$latch
  3   where name='Result Cache: Latch';

      GETS
----------
         0

SQL> begin
  2   for cur in (select /*+ result_cache */ * from t)
  3   loop
  4    null;
  5   end loop;
  6  end;
  7  /

PL/SQL procedure successfully completed.

SQL> select free, count(*) cnt
  2   from v$result_cache_memory
  3   group by free;

FRE        CNT
--- ----------
NO         200
YES         24

SQL> select gets
  2   from v$latch
  3   where name='Result Cache: Latch';

      GETS
----------
       204

Every access to Result Cache incurs two Result Cache: Latch gets:

SQL> begin
  2   for cur in (select /*+ result_cache */ * from t)
  3   loop
  4    null;
  5   end loop;
  6  end;
  7  /

PL/SQL procedure successfully completed.

SQL> select gets
  2   from v$latch
  3   where name='Result Cache: Latch';

      GETS
----------
       206

Now for a bit of history. When Oracle 7 was released, it contained that new thing called Shared Pool. Since Shared Pool was protected by only one latch, that resulted in significant latch contention. Subsequent versions improved this — we now do have multiple library cache latches.

How many latches do we have for Result Cache?

SQL> select count(*)
  2   from v$latch_children
  3   where name='Result Cache: Latch';

  COUNT(*)
----------
         0

We only have one parent latch without any child latches, which sets off alarms in my head. There is still hope, though. Starting with 9i, Oracle had the ability to acquire latches in a shared mode (cache buffers chains latch is an example). So, if Result Cache: Latch can be obtained in a shared mode, that shouldn’t cause a contention between readers. Writers are a different story, but I can live with that as long as read scalability is there.

Let’s do a simple test with four processes trying to read from Result Cache at the same time:

SQL> create table job_times
  2  (
  3   sid   number,
  4   time_ela number
  5  );

Table created.

SQL> create or replace procedure do_work(
  2   p_iterations in number
  3  ) is
  4   l_rowid  rowid;
  5  begin
  6   insert into job_times
  7    values (sys_context('userenv', 'sid'), dbms_utility.get_time)
  8    returning rowid into l_rowid;
  9
 10   for i in 1 .. p_iterations
 11   loop
 12    for cur in (select /*+ result_cache */ * from t)
 13    loop
 14     null;
 15    end loop;
 16   end loop;
 17
 18   update job_times set
 19     time_ela=dbms_utility.get_time-time_ela
 20    where rowid=l_rowid;
 21  end;
 22  /

Procedure created.

Now it’s time to fire things up:

SQL> select gets, misses, sleeps, wait_time
  2   from v$latch
  3   where name = 'Result Cache: Latch';

      GETS     MISSES     SLEEPS  WAIT_TIME
---------- ---------- ---------- ----------
       206          0          0          0

SQL> select * from job_times;

no rows selected

SQL> declare
  2   l_job number;
  3  begin
  4   for i in 1 .. 4
  5   loop
  6    dbms_job.submit(
  7     job => l_job,
  8     what => 'do_work(100000);'
  9    );
 10   end loop;
 11  end;
 12  /

PL/SQL procedure successfully completed.

SQL> commit;

Commit complete.

--allow jobs to complete

SQL> select case grouping(sid) when 1 then 'Total:' else to_char(sid) end sid, sum(time_ela) ela
  2   from job_times
  3   group by rollup((sid, time_ela));

SID           ELA
------ ----------
133         13052
135         13246
136         13016
138         13126
Total:      52440

SQL> select gets, misses, sleeps, wait_time
  2     from v$latch
  3     where name = 'Result Cache: Latch';

      GETS     MISSES     SLEEPS  WAIT_TIME
---------- ---------- ---------- ----------
    800210        636         20    1035979

This is sort of a bummer. I’m running this on a dual-core CPU box, and while there is nothing bad about a miss rate of 0.079%, we got a clear indicator that some requests resulted not only in misses but even in some sleeps. In terms of time, Result Cache: Latch cost us 0.19%. This leaves me curious about how (and how badly) this might perform on something bigger than my dual core box under extremely heavy loads. There is only one latch, after all.

Now let’s see whether taking a single row from Result Cache is more efficient than from the buffer cache. Note that I’m not going to give Result Cache any artificial benefits, like comparing the time to sort and grouping a couple thousand rows from a buffer cache to get a single row, with simply getting that single row from Result Cache. The reason for this is simple. If this is really important to you, there is no sense in doing sorting and aggregation work over and over again when you can have a materialized view that stores exactly the same resulting data as Result Cache does.

I have also decided to do something more interesting than just an FTS. I’ll check how well Result Cache handles lookups, since single-row lookups can easily put significant load on the latching mechanism (remember that every Result Cache access requires two Result Cache: Latch gets). Of course, in a real system, you would try to avoid putting stress on a serialization mechanism as much as you can (by applying good programming techniques and algorithms), but here I’m purposely doing the opposite. Now on to the test.

The most efficient way to do a lookup in a buffer cache is to have a table in a properly-sized hash cluster. In this test, Result Cache won’t be up against a pushover.

SQL> create cluster hc
  2  (
  3   n number(*,0)
  4  ) single table
  5   hashkeys 15000
  6   size 230;

Cluster created.

SQL> create table hc_t
  2  (
  3   n number(*,0),
  4   v varchar2(200)
  5  ) cluster hc (n);

Table created.

SQL> insert into hc_t
  2   select level, dbms_random.string('p', 200)
  3    from dual
  4    connect by level <= 10000;

10000 rows created.

SQL> commit;

Commit complete.

All we need now is two procedures, one with a regular select and another with a cached select:

SQL> create or replace procedure do_hc(
  2   p_iterations in number
  3  ) is
  4   l_rowid  rowid;
  5   l_n number;
  6  begin
  7   insert into job_times
  8    values (sys_context('userenv', 'sid'), dbms_utility.get_time)
  9    returning rowid into l_rowid;
 10
 11   for i in 1 .. p_iterations
 12   loop
 13    l_n:=trunc(dbms_random.value(1, 10000));
 14    for cur in (select * from hc_t where n=l_n)
 15    loop
 16     null;
 17    end loop;
 18   end loop;
 19
 20   update job_times set
 21     time_ela=dbms_utility.get_time-time_ela
 22    where rowid=l_rowid;
 23  end;
 24  /

Procedure created.

SQL> create or replace procedure do_rc(
  2   p_iterations in number
  3  ) is
  4   l_rowid  rowid;
  5   l_n number;
  6  begin
  7   insert into job_times
  8    values (sys_context('userenv', 'sid'), dbms_utility.get_time)
  9    returning rowid into l_rowid;
 10
 11   for i in 1 .. p_iterations
 12   loop
 13    l_n:=trunc(dbms_random.value(1, 10000));
 14    for cur in (select /*+ result_cache */ * from hc_t where n=l_n)
 15    loop
 16     null;
 17    end loop;
 18   end loop;
 19
 20   update job_times set
 21     time_ela=dbms_utility.get_time-time_ela
 22    where rowid=l_rowid;
 23  end;
 24  /

Procedure created.

The hash cluster will go first:

SQL> delete from job_times;

4 rows deleted.

SQL> commit;

Commit complete.

SQL> declare
  2   l_job number;
  3  begin
  4   for i in 1 .. 4
  5   loop
  6    dbms_job.submit(
  7     job => l_job,
  8     what => 'do_hc(100000);'
  9      );
 10   end loop;
 11  end;
 12  /

PL/SQL procedure successfully completed.

SQL> commit;

Commit complete.

--allow jobs to complete

SQL> select case grouping(sid) when 1 then 'Total:' else to_char(sid) end sid, sum(time_ela) ela
  2   from job_times
  3   group by rollup((sid, time_ela));

SID           ELA
------ ----------
131          1141
135          1052
137           943
138          1115
Total:       4251

Now let’s see if Result Cache can beat those numbers:

SQL> delete from job_times;

4 rows deleted.

SQL> commit;

Commit complete.

SQL> select gets, misses, sleeps, wait_time
  2   from v$latch
  3   where name = 'Result Cache: Latch';

      GETS     MISSES     SLEEPS  WAIT_TIME
---------- ---------- ---------- ----------
    800210        636         20    1035979

SQL> declare
  2   l_job number;
  3  begin
  4   for i in 1 .. 4
  5   loop
  6    dbms_job.submit(
  7     job => l_job,
  8     what => 'do_rc(100000);'
  9      );
 10   end loop;
 11  end;
 12  /

PL/SQL procedure successfully completed.

SQL> commit;

Commit complete.

--allow jobs to complete

SQL> select case grouping(sid) when 1 then 'Total:' else to_char(sid) end sid, sum(time_ela) ela
  2   from job_times
  3   group by rollup((sid, time_ela));

SID           ELA
------ ----------
130          1116
131          1126
133          1011
135          1062
Total:       4315

SQL> select gets, misses, sleeps, wait_time
  2   from v$latch
  3   where name = 'Result Cache: Latch';

      GETS     MISSES     SLEEPS  WAIT_TIME
---------- ---------- ---------- ----------
   1620214       8621        510    4315106

Apparently, the results are pretty much comparable. But the number of misses and sleeps went up. Given that we spent more than 3 seconds waiting for Result Cache: Latch, that could be an easy win for Result Cache but… a single latch can be like that.

Are you curious what’s up with cache buffers chains latches after all those tests?

SQL> select gets, misses, sleeps, wait_time
  2   from v$latch
  3   where name = 'cache buffers chains';

      GETS     MISSES     SLEEPS  WAIT_TIME
---------- ---------- ---------- ----------
    858313         75          0          0

Nothing (almost).

Whether history is repeating itself and we need to await another release for multiple Result Cache latches, is an open question. Shared mode gets is yet another open question. Time will tell.

email

Author

Want to talk with an expert? Schedule a call with our team to get the conversation started.

5 Comments. Leave new

Once you get a hammer, everything to you looks like a nail! ;)

Reply

Thank you Alex, this was very helpful. Looks like it may be fixed in R2:

SQL> select * from job_times;

SID TIME_ELA
———- ———-
31 15936
23 15677
34 15362
36 15576

Elapsed: 00:00:00.03
SQL> select name, gets, misses, sleeps, wait_time
2 from v$latch
3 where name like ‘Result Cache%’;

NAME GETS MISSES SLEEPS WAIT_TIME
———————————– ———- ———- ———- ———-
Result Cache: RC Latch 801875 0 0 0
Result Cache: SO Latch 11 0 0 0
Result Cache: MB Latch 0 0 0 0

Best regards.

Reply
Alex Fatkulin
June 9, 2010 9:14 pm

Dave,

the situation has drastically improved in 11GR2 but we’re still not there.

You can reference https://afatkulin.blogspot.com/2010/06/11gr2-result-cache-scalability.html for more details.

Reply

Hi,

I am not sure how people are able to read these blogs. Could you please let me know if we need to do some color coding in order to read these blogs? I couldn’t see the test cases properly. White background and very very light black foreground. I need to copy/paste the whole page into notepad and read it. Is there something I am missing here?

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *