DML by Unique Index Supposedly Affecting Many Rows

Posted in: Oracle, Technical Track

This blog post is based on a real customer case, and illustrates how v$sql reports execution statistics for certain DML types of operations (I purposely have not named which DML types … yet). These reports can be misleading when troubleshooting a performance problem. The goal of this post is to hopefully save you some time in case you encounter a similar case.

First I’ll present some observations from a long-running DELETE statement that’s in progress. Some of you will probably immediately recognize the cause of this behavior, however I’ll also outline the troubleshooting steps based on those observations. I’ll present the root cause at the end of the post.

Scenario description

There was a long-running DELETE statement on a 12.1 database. It was a two-node RAC, but that isn’t relevant in this case. When I checked what the session was doing, I could see it was mainly waiting on “db file sequential read” waits. This could be due to various reasons. Its SQL text and execution plan showed it was executing a DELETE by primary key:

DELETE FROM SOMETBLE WHERE SOMETBLE_ID = :B2
Plan hash value: 3523556778

----------------------------------------------------------------------------------
| Id  | Operation          | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | DELETE STATEMENT   |             |       |       |     1 (100)|          |
|   1 |  DELETE            | SOMETBLE    |       |       |            |          |
|*  2 |   INDEX UNIQUE SCAN| PK_SOMETBL  |     1 |   193 |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("SOMETBLE_ID"=:B2)

If we check the DELETE’s statistics to get an insight into the amount of data it processed so far, we can see:

SQL> select child_number,
       plan_hash_value,
       executions,
       round(elapsed_time/executions/1e6,2) elapsed_time_per_exec,
       round(cpu_time/elapsed_time*100,2) "cpu_time_%",
       round(concurrency_wait_time/elapsed_time*100,2) "concurrency_wait_time_%",
       round(cluster_wait_time/elapsed_time*100,2) "cluster_wait_time_%",
       round(user_io_wait_time/elapsed_time*100,2) "user_io_wait_time_%",
       round(rows_processed/executions,2) rows_processed_per_execution,
           round(buffer_gets/executions,2) buffer_gets_per_execution
from v$sql
where sql_id = '6xs07f3vucupq'
order by 3 desc;  

CHILD_NUMBER PLAN_HASH_VALUE EXECUTIONS ELAPSED_TIME_PER_EXEC CPU_TIME_% CONCURRENCY_WAIT_TIME_% CLUSTER_WAIT_TIME_% USER_IO_WAIT_TIME_% ROWS_PROCESSED_PER_EXECUTION BUFFER_GETS_PER_EXECUTION
------------ --------------- ---------- --------------------- ---------- ----------------------- ------------------- ------------------- ---------------------------- -------------------------
           0      3523556778         26                323.34       9.43                       0                1.11               92.21                     75982.46                5987362.08

SQL>		   

Because the values v$sql reports are cumulative, it’s necessary to calculate deltas between two points in time to obtain the difference in values during this time period. The above is a snapshot I gathered when the problem was occurring, and the only figure increasing between consecutive runs was the increasing number of buffer gets (last column in the output). At the time it was enough to continue with the investigation.

Side note: script to obtain deltas from v$sqlstats without any supporting PL/SQL objects

The above prompted me to adapt Tanel Poder’s topsql.sql script using a technique described in this Jonathan Lewis blog post. The modified script is available here. The main modification is that the script doesn’t require a supporting PL/SQL package. This is useful as some clients (like the one I was working for in this case) don’t allow any modifications on their databases. Using the modified script on my sandbox DB when reproducing the above problem, I could easily get the deltas for the DELETE:

SQL> @topsql buffer_gets/decode(executions,0,1,executions),rows_processed/decode(executions,0,1,executions),executions 10

BUFFER_GETS/DECODE(EXECUTIONS,0,1,EXECUTIONS) ROWS_PROCESSED/DECODE(EXECUTIONS,0,1,EXECUTIONS) EXECUTIONS SQL_ID        PLAN_HASH_VALUE SQL_TEXT
--------------------------------------------- ------------------------------------------------ ---------- ------------- --------------- ------------------------------
                                      6226002                                                0          0 fwgw3xdy86nfy               0 BEGIN test_pkg.purge_t1; END;
                                      6226002                                                0          0 0abrc1f5qqyza      1101221491 DELETE FROM T1 WHERE rn=:B2
Going back to the figures from the customer’s environment

The DELETE was still running, and since the cursor was loaded, there were 26 executions which on average deleted 75982 rows per execution. How is that possible, since the DELETE is accessing data by primary key? Shouldn’t the number of executions be >= rows_processed? Additionally, the number of buffer gets per execution (45.7GB) also doesn’t seem “reasonable” for deleting one row from a table. Given the information collected so far, and assuming for a moment that the v$sql.executions figure might for some reason not be reported correctly, I continued with the investigation.

It’s possible there was some recursive SQL causing those buffer visits. This could be related to triggers or foreign key validation, but the table didn’t have any of those defined. The table had 80 columns, none of which was an LOB (large object). It had 25 indexes, many of them multi-column ones, so perhaps something odd was going on with managing the indexes (although that still doesn’t explain why executions >= rows_processed). Enabling the 10046 trace on this session, there were only “db file sequential read” waits related to various objects (obj#) indicating single block reads. There were no other types of calls (EXEC, FETCH, BINDS), or any recursive SQL:

WAIT #140386332697512: nam='db file sequential read' ela= 509 file#=25 block#=1892854 blocks=1 obj#=227531 tim=5499116275505
WAIT #140386332697512: nam='db file sequential read' ela= 457 file#=26 block#=1961814 blocks=1 obj#=227532 tim=5499116276005
WAIT #140386332697512: nam='db file sequential read' ela= 186 file#=25 block#=1983830 blocks=1 obj#=227533 tim=5499116276235
WAIT #140386332697512: nam='db file sequential read' ela= 569 file#=25 block#=2011326 blocks=1 obj#=227534 tim=5499116276837
WAIT #140386332697512: nam='db file sequential read' ela= 940 file#=26 block#=2030996 blocks=1 obj#=227535 tim=5499116277849
WAIT #140386332697512: nam='db file sequential read' ela= 418 file#=26 block#=2098394 blocks=1 obj#=227537 tim=5499116278308
WAIT #140386332697512: nam='db file sequential read' ela= 474 file#=26 block#=2148990 blocks=1 obj#=227539 tim=5499116278867
WAIT #140386332697512: nam='db file sequential read' ela= 242 file#=25 block#=2212213 blocks=1 obj#=227541 tim=5499116279149
WAIT #140386332697512: nam='db file sequential read' ela= 586 file#=26 block#=2243375 blocks=1 obj#=227543 tim=5499116279766
WAIT #140386332697512: nam='db file sequential read' ela= 474 file#=26 block#=2275160 blocks=1 obj#=227544 tim=5499116280304
WAIT #140386332697512: nam='db file sequential read' ela= 614 file#=25 block#=2285500 blocks=1 obj#=227545 tim=5499116280977
WAIT #140386332697512: nam='db file sequential read' ela= 995 file#=25 block#=2322697 blocks=1 obj#=227546 tim=5499116282044
WAIT #140386332697512: nam='db file sequential read' ela= 432 file#=26 block#=2356469 blocks=1 obj#=227547 tim=5499116282504
WAIT #140386332697512: nam='db file sequential read' ela= 493 file#=25 block#=2357897 blocks=1 obj#=227548 tim=5499116283060

Running a quick summary on the trace file, and not even filtering on “db file sequential read,” reported various objects:

$ grep "obj#=" db_j001_4384.trc | cut -f12 -d' ' | sort | uniq -c
1378 obj#=224155
516 obj#=227528
662 obj#=227529
395 obj#=227530
742 obj#=227531
448 obj#=227532
733 obj#=227533
402 obj#=227534
713 obj#=227535
495 obj#=227536
504 obj#=227537
480 obj#=227538
923 obj#=227539
881 obj#=227540
444 obj#=227541
605 obj#=227543
431 obj#=227544
641 obj#=227545
403 obj#=227546
273 obj#=227547
289 obj#=227548
651 obj#=227549
301 obj#=230064
295 obj#=230065

Apart from the table segment, the above object IDs belonged to indexes defined on this table (names redacted):

SELECT owner,
       object_name,
       object_type
FROM   dba_objects
WHERE  object_id IN ( 224155, 227528, 227532, 227536,
                      227537, 227533, 227534, 227535,
                      227537, 227540, 227543, 227544,
                      227545, 227546, 227549, 230064,
                      224155, 227528, 227529 ); 

OWNER        OBJECT_NAME    OBJECT_TYPE
------------ -------------- -----------------------
SOMESCHEMA   T1             TABLE
SOMESCHEMA   I1             INDEX
SOMESCHEMA   I2             INDEX
SOMESCHEMA   I3             INDEX
SOMESCHEMA   I4             INDEX
SOMESCHEMA   I5             INDEX
SOMESCHEMA   I6             INDEX
SOMESCHEMA   I7             INDEX
SOMESCHEMA   I8             INDEX
SOMESCHEMA   I9             INDEX
SOMESCHEMA   I10            INDEX
SOMESCHEMA   I12            INDEX
SOMESCHEMA   I13            INDEX
SOMESCHEMA   I14            INDEX
SOMESCHEMA   I15            INDEX
SOMESCHEMA   I16            INDEX

16 rows selected.	

Reason for this behavior

The values in v$session’s PLSQL_ENTRY_OBJECT_ID and PLSQL_ENTRY_SUBPROGRAM_ID columns pointed to a PL/SQL packaged procedure where the DELETE was defined. This code segment is responsible for executing the DELETE:

forall i in istart..istop save exceptions
    EXECUTE IMMEDIATE 'DELETE FROM '||p_table_name||' WHERE '||p_column_name||' = :B2' USING p_arrIDs(i);

We can see that the reason for this behavior is bulk processing, where one execution is reported for the whole array of processed values. Reproducing the above scenario on my sandbox DB confirmed it. I also observed — and this might be relevant if you encounter a similar scenario — that if you enable 10046 tracing while the DELETE is already running, only WAIT calls will be output to the trace file. Let me show you what happens if you enable tracing before the DELETE starts, or when the DELETE is executed in a loop e.g. using this sample code:

create or replace package body test_pkg is
procedure purge_t1 is
  type t_idarray is table of number;
  ids  t_idarray;
 
  cursor c1 
  is 
    select rn
    from t1;     
  
begin
open c1;

loop
   fetch c1
   bulk collect into ids
   limit 100000;
   
   exit when ids.count = 0;

   forall i in 1 .. ids.count
     execute immediate 'DELETE FROM T1 WHERE rn=:B2' using ids(i);
end loop;
end;

end;
/

The trace file reports additional recursive SQL and BIND values since the DELETE is started on the next loop. For example:

PARSING IN CURSOR #140712606682816 len=27 dep=1 uid=103 oct=7 lid=103 tim=3051559302 hv=2339077098 ad='64a47aa0' sqlid='0abrc1f5qqyza'
DELETE FROM T1 WHERE rn=:B2
END OF STMT
.
.
BINDS #140712606682816:
 Bind#0
  oacdty=02 mxl=22(21) mxlc=00 mal=00 scl=00 pre=00
  oacflg=13 fl2=204001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7ffa350d1cf0  bln=22  avl=04  flg=09
  value=200066
BINDS #140712606682816:
 Bind#0
  oacdty=02 mxl=22(21) mxlc=00 mal=00 scl=00 pre=00
  oacflg=13 fl2=204001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7ffa350d1d18  bln=22  avl=04  flg=09
  value=200067

Final note

I looked for articles describing the above observations and found one from Jonathan Lewis describing a very similar scenario when analyzing a bulk UPDATE statement: https://jonathanlewis.wordpress.com/2017/03/21/deception/.

There is also a similar observation in this Connor McDonald post when he analyzed a bulk INSERT statement behavior with tkprof:
https://blogs.oracle.com/oraclemagazine/a-fresh-look-at-auditing-row-changes.

I hope you found this post useful. If you have any questions, or thoughts, please feel free to leave them in the comments!

email

Author

Interested in working with Jure? Schedule a tech call.

About the Author

Oracle Database Consultant
Jure Bratina is an experienced Oracle DBA currently working as a Database Consultant with The Pythian Group. The areas he most likes to work on are troubleshooting and performance tuning of Oracle databases. Jure is also a member of the Executive Committee at the Slovenian Oracle User Group (SIOUG).

No comments

Leave a Reply

Your email address will not be published. Required fields are marked *