A couple of weeks ago Christo Kutrovsky mentioned to me about Oracle Streams presentation he saw on this year’s UKOUG. The presentation was from CERN’s Eva Dafonte Pérez and, among over things, Eva mentions about substantial performance enhancements observed in 11GR2.
It is somewhat timely that we’ve been doing some Oracle Golden Gate testing which in turn made me curious to take a closer look at Oracle Streams in 11GR2 and see where all the performance is coming from.
I’ve setup a simple replication for table t1 from schema src
to schema dst
, changed Apply Server parallelism to 1 and did a simple test with inserting 100 rows while performing a sql trace:
Name Type Nullable Default Comments
—- ————- ——– ——- ——–
N NUMBER
V VARCHAR2(100) Y
SQL> select count(*) from src.t1;
COUNT(*)
———-
0
SQL> select count(*) from dst.t1;
COUNT(*)
———-
0
SQL> select sid from v$streams_apply_server;
SID
———-
22
SQL> exec dbms_monitor.session_trace_enable(22, waits => true);
PL/SQL procedure successfully completed
SQL> insert into src.t1
2 select level, to_char(level)
3 from dual
4 connect by level <= 100;
100 rows inserted SQL> commit;
Commit complete
SQL> select count(*) from dst.t1;
COUNT(*)
———-
100
SQL> exec dbms_monitor.session_trace_disable(22);
PL/SQL procedure successfully completed
[/sql]
To my surprise, when I went to take a look at the trace file, I couldn’t find anything related to Apply process inserting rows into dst.t1
, only a handful of internal housekeeping statements. This made me curious as to where all the stuff really went to so I’ve decided to take a look at v$sql
and see if there will be any clues:
2 from v$sql
3 where lower(sql_text) like ‘%insert%dst%t1%’;
SQL_TEXT
——————————————————————————–
select sql_text from v$sql where lower(sql_text) like ‘%insert%dst%t1%'[/sql]
The only thing I was able to see there was . . . my own statement. Things were starting to look serious.
In order to finally figure out who did what I’ve launched a logminer:
[sql] SQL> begin2 dbms_logmnr.start_logmnr(
3 startTime => sysdate-30/1440,
4 endTime => sysdate,
5 Options => dbms_logmnr.DICT_FROM_ONLINE_CATALOG+dbms_logmnr.CONTINUOUS_MINE
6 );
7 end;
8 /
PL/SQL procedure successfully completed[/sql]
Let’s see who actually inserted all these rows:
[sql]SQL> select * from (2 select session#, sql_redo
3 from v$logmnr_contents
4 where operation=’INSERT’
5 and seg_owner=’DST’
6 and table_name=’T1′
7 order by timestamp desc
8 ) where rownum <= 5;
SESSION# SQL_REDO
———- ——————————————————————————–
22 insert into "DST"."T1"("N","V") values (‘1′,’1’);
22 insert into "DST"."T1"("N","V") values (‘5′,’5’);
22 insert into "DST"."T1"("N","V") values (‘4′,’4’);
22 insert into "DST"."T1"("N","V") values (‘3′,’3’);
22 insert into "DST"."T1"("N","V") values (‘2′,’2’);
[/sql]
Session with SID 22 is nothing else but our Apply Server . . .
The next step was to try and figure out whether we’re really dealing with some new codepath responsible for such spectacular performance (apparently, due to the complete lack of instrumentation :) or this is just another weird Oracle bug.
I’ve blocked the Apply Server on updating a row and looked at the Apply Server’s stack:
[bash] [[email protected] trace]$ pstack 17036#0 0x0000003b83cd450a in semtimedop () from /lib64/libc.so.6
#1 0x00000000085ef3f3 in sskgpwwait ()
#2 0x00000000085ee5c6 in skgpwwait ()
#3 0x000000000829ee31 in ksliwat ()
#4 0x000000000829e422 in kslwaitctx ()
#5 0x0000000000af92f5 in ksqcmi ()
#6 0x00000000082ac019 in ksqgtlctx ()
#7 0x00000000082aa77a in ksqgelctx ()
#8 0x0000000000c4d566 in ktcwit1 ()
#9 0x00000000082d5d99 in kdddgb ()
#10 0x00000000082c7530 in kdusru ()
#11 0x00000000082c0902 in kauupd ()
#12 0x0000000001f57c14 in kddiruUpdate ()
#13 0x000000000179eeab in knasdaExecDML ()
#14 0x000000000179d928 in knasdaProcDML ()
#15 0x000000000178c6fd in knaspd ()
#16 0x0000000001787d2f in knasplcr ()
#17 0x00000000017866d7 in knaspx ()
#18 0x0000000001770fd5 in knalsProc1Txn ()
#19 0x000000000177022d in knalsptxn ()
#20 0x00000000017424a6 in knasm2 ()
#21 0x0000000001776d8d in knalsma ()
#22 0x0000000000c25a7d in knlkcbkma ()
#23 0x0000000000b93ba7 in ksvrdp ()
#24 0x00000000020d2dd7 in opirip ()
#25 0x00000000016fe729 in opidrv ()
#26 0x0000000001b7183f in sou2o ()
#27 0x00000000009d3f8a in opimai_real ()
#28 0x0000000001b76ace in ssthrdmain ()
#29 0x00000000009d3e71 in main ()
[/bash]
Before we move on, here is a stack dump from a blocked Apply Server in 10.2.0.4:
[bash highlight=”25″] [[email protected] trace]$ pstack 23787#0 0x0000003b83cd450a in semtimedop () from /lib64/libc.so.6
#1 0x00000000085ef3f3 in sskgpwwait ()
#2 0x00000000085ee5c6 in skgpwwait ()
#3 0x000000000829ee31 in ksliwat ()
#4 0x000000000829e422 in kslwaitctx ()
#5 0x0000000000af92f5 in ksqcmi ()
#6 0x00000000082ac019 in ksqgtlctx ()
#7 0x00000000082aa77a in ksqgelctx ()
#8 0x0000000000c4d566 in ktcwit1 ()
#9 0x00000000082d5d99 in kdddgb ()
#10 0x00000000082c7530 in kdusru ()
#11 0x00000000082c0902 in kauupd ()
#12 0x00000000084588c9 in updrow ()
#13 0x00000000084f2580 in qerupFetch ()
#14 0x0000000008453cdd in updaul ()
#15 0x0000000008451bca in updThreePhaseExe ()
#16 0x00000000084509f5 in updexe ()
#17 0x00000000083fe18f in opiexe ()
#18 0x00000000083f5c0d in opiall0 ()
#19 0x0000000008403d25 in opikpr ()
#20 0x00000000083f78b9 in opiodr ()
#21 0x00000000084892af in __PGOSF141_rpidrus ()
#22 0x00000000085ee820 in skgmstack ()
#23 0x000000000848a759 in rpiswu2 ()
#24 0x000000000848fdf4 in kprball ()
#25 0x0000000001c7c4d7 in knipxup ()
#26 0x0000000001c72651 in knipdis ()
#27 0x000000000178cacc in knaspd ()
#28 0x0000000001787d2f in knasplcr ()
#29 0x00000000017866d7 in knaspx ()
#30 0x0000000001770fd5 in knalsProc1Txn ()
#31 0x000000000177022d in knalsptxn ()
#32 0x00000000017424a6 in knasm2 ()
#33 0x0000000001776d8d in knalsma ()
#34 0x0000000000c25a7d in knlkcbkma ()
#35 0x0000000000b93ba7 in ksvrdp ()
#36 0x00000000020d2dd7 in opirip ()
#37 0x00000000016fe729 in opidrv ()
#38 0x0000000001b7183f in sou2o ()
#39 0x00000000009d3f8a in opimai_real ()
#40 0x0000000001b76ace in ssthrdmain ()
#41 0x00000000009d3e71 in main ()[/bash]
The stack is only 30 functions deep in 11.2.0.1 compared to 42 in 10.2.0.4! Given that whatever goes up the stack from ktcwit1 ()
function is due to both sessions waiting on the enqueue, the relative codepath change is even bigger.
All the difference comes from the one key thing: a recursive call. If you take a look at line #25 (highlighted), you’ll notice rpiswu2 ()
function (for these of you unfamiliar with Oracle Kernel Layers, RPI stands for Recursive Program Interface). Whatever happens further up the stack is essentially the same codepath any user session would use while executing an UPDATE statement. The Apply Servers in 10.2.0.4 generally behave like any other user session would and whatever diagnostic techniques you have learned while troubleshooting user issues could be, to a large extent, applied to the Apply Servers as well. Every LCR execution leads to at least one recursive call (so if you got, say, a transaction with 1000 LCRs that would be at least 1000 recursive calls by the Apply Server). In 11.2.0.1 the recursive call is missing and the codepath is different up to kauupd ()
(KA, Access Layer) function.
Indeed, by looking at the Apply Server statistics in 11.2.0.1 you will notice that executing an LCR no longer results in a recursive call so the entire change seems to be around a shortcut which allows the Apply Server to proceed directly into KD (Data) layer, bypass the “regular” codepath and avoid a recursive call.
On a side note it appears the this new codepath was first introduced in 11.1.0.7.
What’s up with the instrumentation?
While performance improvement is certainly most welcome, there is a big downside—all these new functions seems to be poorly, or not-at-all, instrumented. This makes it hard to evaluate the gains, as some stuff is simply not there.
How are you supposed to figure out what’s going on then? The good news is that all regular dynamic performance views (like v$session_wait
, v$session_event
, etc.) seems to be populated correctly, but sql trace took a big hit (plus you can no longer see sql_id
in v$session
). Whatever falls out of the “old” stuff looks like a black box . . . pretty much.
Puzzled by this problem, I’ve tried to see whether there is any easy way to enable the old codepath so you can get all the instrumentation facilities back in place. After some trial and error, it turned out that a simple row level trigger . . .
[sql]SQL> create or replace trigger dst.buid_t1 before delete or insert or update on dst.t12 for each row
3 begin
4 null;
5 end;
6 /
Trigger created
SQL> begin
2 dbms_ddl.set_trigger_firing_property(‘DST’, ‘BUID_T1’, false);
3 end;
4 /
PL/SQL procedure successfully completed[/sql]
. . . is enough to get the old codepath back. Here is the stack of Apply Server process in 11.2.0.1 with such a trigger in place:
[bash highlight=”26″][[email protected] trace]$ pstack 30640#0 0x0000003b83cd450a in semtimedop () from /lib64/libc.so.6
#1 0x00000000085ef3f3 in sskgpwwait ()
#2 0x00000000085ee5c6 in skgpwwait ()
#3 0x000000000829ee31 in ksliwat ()
#4 0x000000000829e422 in kslwaitctx ()
#5 0x0000000000af92f5 in ksqcmi ()
#6 0x00000000082ac019 in ksqgtlctx ()
#7 0x00000000082aa77a in ksqgelctx ()
#8 0x0000000000c4d566 in ktcwit1 ()
#9 0x00000000082d5d99 in kdddgb ()
#10 0x00000000082c7530 in kdusru ()
#11 0x00000000082c0902 in kauupd ()
#12 0x00000000084588c9 in updrow ()
#13 0x00000000084f2580 in qerupFetch ()
#14 0x00000000046a363f in qerstFetch ()
#15 0x0000000008453cdd in updaul ()
#16 0x0000000008451bca in updThreePhaseExe ()
#17 0x00000000084509f5 in updexe ()
#18 0x00000000083fe18f in opiexe ()
#19 0x00000000083f5c0d in opiall0 ()
#20 0x0000000008403d25 in opikpr ()
#21 0x00000000083f78b9 in opiodr ()
#22 0x00000000084892af in __PGOSF141_rpidrus ()
#23 0x00000000085ee820 in skgmstack ()
#24 0x000000000848a759 in rpiswu2 ()
#25 0x000000000848fdf4 in kprball ()
#26 0x0000000001c7c4d7 in knipxup ()
#27 0x0000000001c72651 in knipdis ()
#28 0x000000000178cacc in knaspd ()
#29 0x0000000001787d2f in knasplcr ()
#30 0x00000000017866d7 in knaspx ()
#31 0x0000000001770fd5 in knalsProc1Txn ()
#32 0x000000000177022d in knalsptxn ()
#33 0x00000000017424a6 in knasm2 ()
#34 0x0000000001776d8d in knalsma ()
#35 0x0000000000c25a7d in knlkcbkma ()
#36 0x0000000000b93ba7 in ksvrdp ()
#37 0x00000000020d2dd7 in opirip ()
#38 0x00000000016fe729 in opidrv ()
#39 0x0000000001b7183f in sou2o ()
#40 0x00000000009d3f8a in opimai_real ()
#41 0x0000000001b76ace in ssthrdmain ()
#42 0x00000000009d3e71 in main ()[/bash]
Now that looks much more familiar! All the instrumentation appeared to be back in place as well.
I’ve also discovered that:
DELETE
seems to be always handled through the old codepath.- In case you have a unique constraint or a primary key supported by a non-unique index,
INSERT
will fall back to the old codepath. UPDATE
needs a primary key or a key column(-s) supported by an index in order to use the new codepath.
It remains to be seen whether this new codepath has been implemented as a shortcut for most frequently used scenarios, or whether there are some implementation restrictions as it progresses with the future releases . . . or maybe not, due to Golden Gate taking over as a strategic direction.
5 Comments. Leave new
[…] 7-Why does streams appyl process perform well in 11GR2 (oracle internals)? Alex Fatkulin-Oracle Streams Apply Process changes in 11GR2 […]
Hello Alex,
I’ve seen a note on the MOS called “Combined Capture and apply” (463820.1) – though it’s for 11gR1 maybe this is what you’ve discovered.
Hi Timur,
this Metalink Note explains the feature where the Capture process can communicate directly with the Apply process (most probably through an IPC calls). This is relevant as far as data transport is concerned (i.e. you do not stage LCRs in the queue) but doesn’t really (or have to) affect the way the Apply server executes the LCRs. Indeed, the combined capture and apply appeared in 11.1.0.6 but Apply in this version still used the “classic” codepath for LCRs executions.
Hi Alex
On a side note… Something similar happened to AQ as well. At the beginning the operartions used to insert/delete entries from a queue were based on regular SQL statements. At some point in time (don’t remember the version…) some of the operations were replaced by other code paths that were no longer using SQL statements. As you correctly pointed out, performance are better but profiling is no longer that easy…
Cheers,
Chris
Hi Chris,
Buffered queues were introduced in 10.1 so that’s one of the places I would expect a change. These were used by Streams but some part(-s) of the new code path could have end up for user enqueued messages as well.
Unfortunately, even with a new codepath, the Apply server remained on a row-by-row processing. It’ll be interesting to see whether this approach provided a better return as opposed to, say, introducing bulk processing which is something that Oracle GoldenGate can do already.