Mysterious rollback and replay with partitions

Posted in: Oracle, Technical Track

Recently I had a situation where I was loading a partitioned table and during the load, which was going to take hours, I realised that I wanted to add DEFAULT and NULL partitions to help with the robustness of future inserts.
I was able to add the extra partitions but then noticed that my INSERT was rolling back. It had already processed for a few hours so I was dismayed that I would have to wait for the rollback and then re-issue the command, but to my astonishment the rollback completed and the statement automatically re-played and indeed went on to completion. Obviously this took much longer than anticipated but I was grateful that it worked.

Following this behaviour I decided to investigate a little deeper and was able to reproduce the issue.

Oracle Database version -> 12.2.0.1
The load table was hash partitioned with a range sub-partition and was set up as follows

CREATE TABLE output_main_archive (
          EXTRACT_ID       NUMBER
        , EXTRACT_DATE     DATE
        , FIELD1           VARCHAR2(255)
)
   PARTITION BY LIST (extract_id)
   SUBPARTITION BY RANGE (extract_date) 
   SUBPARTITION TEMPLATE 
   ( 
          SUBPARTITION ED_JAN_17 VALUES LESS THAN (TO_DATE('01-FEB-2017','DD-MON-YYYY'))
        , SUBPARTITION ED_FEB_17 VALUES LESS THAN (TO_DATE('01-MAR-2017','DD-MON-YYYY'))
        , SUBPARTITION ED_MAR_17 VALUES LESS THAN (TO_DATE('01-APR-2017','DD-MON-YYYY'))
        , SUBPARTITION ED_APR_17 VALUES LESS THAN (TO_DATE('01-MAY-2017','DD-MON-YYYY'))
        , SUBPARTITION ED_MAY_17 VALUES LESS THAN (TO_DATE('01-JUN-2017','DD-MON-YYYY'))
        , SUBPARTITION ED_JUN_17 VALUES LESS THAN (TO_DATE('01-JUL-2017','DD-MON-YYYY'))
        , SUBPARTITION ED_JUL_17 VALUES LESS THAN (TO_DATE('01-AUG-2017','DD-MON-YYYY'))
        , SUBPARTITION ED_AUG_17 VALUES LESS THAN (TO_DATE('01-SEP-2017','DD-MON-YYYY'))
        , SUBPARTITION ED_SEP_17 VALUES LESS THAN (TO_DATE('01-OCT-2017','DD-MON-YYYY'))
        , SUBPARTITION ED_OCT_17 VALUES LESS THAN (TO_DATE('01-NOV-2017','DD-MON-YYYY'))
        , SUBPARTITION ED_NOV_17 VALUES LESS THAN (TO_DATE('01-DEC-2017','DD-MON-YYYY'))
        , SUBPARTITION ED_DEC_17 VALUES LESS THAN (TO_DATE('01-JAN-2018','DD-MON-YYYY'))
   )   
   (
          PARTITION ext_id_1 VALUES ( 1 )              
        , PARTITION ext_id_2 VALUES ( 2 )              
        , PARTITION ext_id_3 VALUES ( 3 )              
        , PARTITION ext_id_4 VALUES ( 4 )              
        , PARTITION ext_id_5 VALUES ( 5 )              
        , PARTITION ext_id_6 VALUES ( 6 )              
        , PARTITION ext_id_7 VALUES ( 7 )              
        , PARTITION ext_id_8 VALUES ( 8 )              
        , PARTITION ext_id_9 VALUES ( 9 )              
        , PARTITION ext_id_10 VALUES ( 10 )            
   )                                                
/                                                  

I loaded a source table with some random test data like so

CREATE TABLE output_main (
          EXTRACT_ID       NUMBER
        , EXTRACT_DATE     DATE
        , FIELD1           VARCHAR2(255)
)
/

INSERT /*+ APPEND */ INTO output_main
SELECT TRUNC(DBMS_RANDOM.value(1,11))
     , TO_DATE('01/01/2017','DD/MM/YYYY') + DBMS_RANDOM.value(0,365)
     , DBMS_RANDOM.string('L',TRUNC(DBMS_RANDOM.value(100,255))) AS long_string
FROM   dual
CONNECT BY level <= 100000
/

INSERT INTO output_main SELECT * FROM output_main;
INSERT INTO output_main SELECT * FROM output_main;

COMMIT;

This gives me 400K rows to play with which is enough time for me to gather some evidence during the INSERT run.

The first run was simply a copy of the rows from the source table to the partitioned table. I have also set on tracing so I can see what is happening.

exec DBMS_MONITOR.SESSION_TRACE_ENABLE

INSERT INTO output_main_archive SELECT * FROM output_main;

COMMIT;

I examined the trace files using 3 methods

  1. tkprof - Oracle provided trace analysis tool
  2. tvdxtat.sh - A free tool written by Christian Antognini from Trivadis -> Introduce TVD$XTAT
  3. orasrp - A free tool written by Egor Starostin -> Oracle Session Resource Profiler

The last 2 tools produce html output and tkprof produces a text output.

After examining the trace file and parsing it through tkprof I got the following output

INSERT INTO output_main_archive SELECT * FROM output_main


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1     23.97     113.81      16760     337419    1193762      400000
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2     23.98     113.82      16760     337419    1193762      400000

We can see that the statement took just under 2 minutes on my little test system. Further more, after using the TriVaDis eXtended Tracefile Analysis Tool 4.0, I can see that 15 recursive statements were executed.

Here they are

update seg$ ...
select ... from seg$ ...
delete from deferred_stg$ ...
delete from seg$ ...
update partobj$ ...
insert into seg$ ...
update tabsubpart$ ...
select ... from seg$ ...
delete from tab_stats$ ...
select ... from deferred_stg$ ...
select ... from RecycleBin$ ...
select ... from undo$ ...
select ... from sys.obj$ o, sys.user$ u, sys.trigger$ t, sys.obj$ bo ...
select ... from opt_directive_own$ ...
select ... from hist_head$ ...

Now for the test run where I will ALTER the table during my INSERT.

The ALTER statement is as follows

ALTER TABLE output_main_archive ADD
	PARTITION ext_id_null VALUES (NULL)
      , PARTITION ext_id_unknown VALUES (DEFAULT)
/

During the course of the run I will check the transactions using the following SQL

set lines 140
set pages 140

col osuser for a15
col username for a15
col logon_time for a25
col transaction_date for a25

select s.sid
     , s.serial#
     , s.osuser
     , s.username
     , t.used_ublk
     , to_char(s.logon_time,'DD-MON-YYYY HH24:MI:SS') logon_time
     , to_char(t.start_date,'DD-MON-YYYY HH24:MI:SS') transaction_date
from v$session s
   , v$transaction t
where s.taddr = t.addr
order by t.start_date;

Here's my sequence of events

  1. Session 1 : Perform the INSERT into the partitioned table
  2. Session 2 : Examine the transaction
  3. Session 2 : ALTER the partitioned table
  4. Session 2 : Examine the transaction
  5. Session 1 : Watch the transaction complete and commit

I started the INSERT command and the following is the output from my transaction script

[email protected](LUKE) SQL> @trans

       SID    SERIAL# OSUSER          USERNAME         USED_UBLK LOGON_TIME                TRANSACTION_DATE
---------- ---------- --------------- --------------- ---------- ------------------------- -------------------------
       254      17710 oracle          PART                   309 28-MAR-2018 13:03:53      28-MAR-2018 13:05:18

1 row selected.

Note the USED_UBLK number - this is the size of the UNDO space that is being used by the transaction.

After having paused for a short while I ran it again

[email protected](LUKE) SQL> @trans

       SID    SERIAL# OSUSER          USERNAME         USED_UBLK LOGON_TIME                TRANSACTION_DATE
---------- ---------- --------------- --------------- ---------- ------------------------- -------------------------
       254      17710 oracle          PART                  1542 28-MAR-2018 13:03:53      28-MAR-2018 13:05:18

1 row selected.

We can see that the UNDO usage is rising as expected.

I then performed the ALTER statement detailed above and monitored the UNDO usage. Here's what I saw

[email protected](LUKE) SQL> @trans

       SID    SERIAL# OSUSER          USERNAME         USED_UBLK LOGON_TIME                TRANSACTION_DATE
---------- ---------- --------------- --------------- ---------- ------------------------- -------------------------
       254      17710 oracle          PART                  1244 28-MAR-2018 13:03:53      28-MAR-2018 13:05:18

1 row selected.

[email protected](LUKE) SQL> /

       SID    SERIAL# OSUSER          USERNAME         USED_UBLK LOGON_TIME                TRANSACTION_DATE
---------- ---------- --------------- --------------- ---------- ------------------------- -------------------------
       254      17710 oracle          PART                    99 28-MAR-2018 13:03:53      28-MAR-2018 13:05:18

1 row selected.

[email protected](LUKE) SQL> /

       SID    SERIAL# OSUSER          USERNAME         USED_UBLK LOGON_TIME                TRANSACTION_DATE
---------- ---------- --------------- --------------- ---------- ------------------------- -------------------------
       254      17710 oracle          PART                    37 28-MAR-2018 13:03:53      28-MAR-2018 13:05:18

1 row selected.

[email protected](LUKE) SQL> /

       SID    SERIAL# OSUSER          USERNAME         USED_UBLK LOGON_TIME                TRANSACTION_DATE
---------- ---------- --------------- --------------- ---------- ------------------------- -------------------------
       254      17710 oracle          PART                    99 28-MAR-2018 13:03:53      28-MAR-2018 13:06:34

1 row selected.

[email protected](LUKE) SQL> /

       SID    SERIAL# OSUSER          USERNAME         USED_UBLK LOGON_TIME                TRANSACTION_DATE
---------- ---------- --------------- --------------- ---------- ------------------------- -------------------------
       254      17710 oracle          PART                   230 28-MAR-2018 13:03:53      28-MAR-2018 13:06:34

1 row selected.

We see the USED_UBLK reduce and then rise again showing us that the UNDO was being used during a rollback before the statement began again using UNDO.

Once again I had traced this session and here are the results from tkprof

INSERT INTO output_main_archive SELECT * FROM output_main


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.01          0          0          0           0
Execute      1     76.44     168.62      23859     497949    2153572      400000
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2     76.46     168.63      23859     497949    2153572      400000

We can see that the elapsed time was significantly larger rising from 113.82s to 168.63s. Here are the main differences in the CPU and wait events

                               1st Run                2nd Run                               
Component	               Total Duration [s]     Total Duration [s]

CPU                                        23.983                 76.301
free buffer waits                          44.404                 48.304
db file sequential read                     5.934                 21.437
recursive statements                        4.230                  8.176

And the largest difference, using the output from orasrp, is that the 15 recursive statements increased to 922 statements (146 of these are unique) for the interrupted INSERT statement.

One other thing, also shown using the orasrp tool, was that the UNDO data file is being used, as the read time and number of reads on the file ID (in my case it was file ID 10) had increased by a factor of 10.

Session Read Datafiles Statistics

                1st Run              2nd Run
Datafile # 	Seconds     Calls    Seconds     Calls

11              7.0077s     6,091   22.6365s     7,730
9               0.0469s       122    0.1172s       123
10              0.0007s        46    1.7878s       484
8               0.0002s	        9    0.1110s        49

So just some evidence that Oracle handles some situations for us and instead of either hanging the DDL or failing the INSERT, both statements complete successfully, admittedly after using a lot more resources than first anticipated.

email

Author

Want to talk with an expert? Schedule a call with our team to get the conversation started.

About the Author

I am currently one of the Oracle Internal Principal Consultants at Pythian. I have been working as a DBA since Oracle V6 for over 30 years and also dabble in various languages and tools such as bash, Perl, PL/SQL, C, Python, Ansible, Terraform and Go.

1 Comment. Leave new

Ivica Arsov
April 5, 2018 4:43 pm

I would like to add some additional information. In short, the query got invalidated. We can confirm via V$SQL.INVALIDATIONS. It is expected behavior (by design), when a table is altered, anything that depends on it it’s invalidated. With statistics gathering we have the option to set not to invalidate cursors.
We can confirm this behavior with generating errostack on 14403.
ORA-14403 “cursor invalidation detected after getting DML partition lock” is one example of an error which is internally handled. ORA-14403 is signalled by the INSERT process itself, when it looks for a partition the cursor state is checked. If the cursor is invalidated the error is signalled which will trigger the restart.

Command to generate 14403 errorstack:
alter system set events ‘14403 trace name errorstack level 3’;

Snippet from trace file:

dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x0, level=3, mask=0x0)
—– Error Stack Dump —–
ORA-14403: cursor invalidation detected after getting DML partition lock
—– Current SQL Statement for this session (sql_id=49pzdn1t1xzdt) —–
INSERT INTO output_main_archive SELECT * FROM output_main

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *