In this post:
- Question and background
- My opinion
Tom Kyte, Cary Millsap, Steven Feuerstein hold a Database Guru Panel at #Kscope 11’s conference organized by ODTUG. I had a great opportunity to not just watch the Live Stream (you can see the recorded version HERE) but also ASK questions ;). Big kudos to organizers for the fantastic opportunity to ask questions via Twitter and Face Book live. I wasn’t sure if it was going to work . Surprisingly enough several of my questions had been asked and answered live. One of the questions was related to UNDO/REDO statistics. I think that it was a bit of challenge to define the question precisely having just 140 Twitter characters and therefore I decided to share my question here, give a bit of background and ask you “What is your opinion?”
Question and background
The following is the question I asked in the twitter (16 min 14sec in the recording ).
Tweet: #kscope Q to Guru Panel: What is the best way to find TOP SQLs generating the most UNDO/REDO in 11GR2?
Tom Kyte response: My first response to that would be “Why?”
Well … This questions is coming from my day to day Oracle DBA duties. On several occasions I faced situations when an Oracle database (Instance) all over the sudden started to generate significantly more REDO data then usually. This quickly may “eat” all space allocated to archive log destinations and generate many other problems quite quickly. In such cases an Oracle DBA uses any diagnostics options available to find the root cause of the problem and resolve it in as short time as possible. My question is what Oracle DB kernel instrumentation do we have in place to troubleshoot such cases?
NOTE: It might be that word “SQLs” in my original question text introduced a bit of confusion. I really should have said DML operations instead of SQL. However I think many of us use word SQL for DML operations inclusively (including Oracle in V$SQL views ;).
In my opinion we are missing a bit of instrumentation here. V$SQL, V$SQLAREA and other Oracle view are focused on Reads related statistics. This is understandable as most of operations a database is doing are data retrieval and Oracle did a great job here. However for the use case I described it would be very useful to have a set of additional columns in these views that would provide us with data changes related statistics.
What other options/instrumentation we have for finding the place in an application that suddenly started to generate a lot of data changes? Well there are several. However none of those provides us with a way to point the SQL (DML) that is a root cause of the problem. The options that I can think about are as following:
- LogMiner – there are 2 problems with this option. 1 – it takes some time to setup and run it (under time pressure we may not have such luxury) 2- surprisingly enough LogMiner views don’t provide REDO statistics :) We are left with an option to guess what statement generates most of changes based on execution count (which isn’t always a good indicator)
- V$SEGMENT_STATISTICS – provides exactly what we need with a small exception. The information is segment/table based. As soon as we determine the table we on our own to find the related SQL. In most of the cases it might not be that difficult to find SQLs that have been executed during statistics gathering time-frame and guess which one did cause the problem. However if DML operation didn’t use a table name in it’s text (e.g. based on a view) we may not be that lucky and it isn’t that simple than to find any segment related DML operations.
- V$SESSTAT – great view, but provides statistics per database session. It provides valuable information for our investigation, however it doesn’t give use precise information what SQL (module) is causing the issue. We are on our own again to guess where the problem is.
- Extended SQL Trace (10046) – :) guess what ? there is no UNDO/REDO statistics available in trace files. Even if we have found an application module and enabled trace there is still no easy way to find the SQL that generates the most REDO data.
You would say that combining all methods named above we could narrow the scope and find the root cause of issue. And you are right. In no other options available we use any diagnostics to find the root cause. However I would say that if V$SQL… views would have UNDO/REDO statistics we could point to the SQL causing the issue in no time. I think there is a place for enhancement request.
What do you think? Have you been in such situations as I described? What options did you use to find the SQL responsible for the problem? Does Oracle need to add UNDO/REDO related statistics in database kernel instrumentation?
Please do not hesitate to leave your comment for this blog post.
Are you looking for something historical in nature or real-time? (i.e. AWR or v$)
v$transaction should tell you via used_ublk , and you can join that with v$session and v$sql to get the sql and session information. Although I have doubts this will point to a single DML statement as it is transaction based.
Would it be enough to look at all DML statements in the v$sql view and ch eck out the rows_processed field? I suppose that doesn’t tell much as the size of the row would determine the change vectors and redo.
real time I use something like:
select a.os_user_name os
, a.oracle_username us
, b.owner own
, b.object_name obj
, b.object_type typ
, c.segment_name seg
, d.used_urec urec
, d.used_ublk ublk
from v$locked_object a
, dba_objects b
, dba_rollback_segs c
, v$transaction d
, v$session e
where a.object_id = b.object_id
and a.xidusn = c.segment_id
and a.xidusn = d.xidusn
and a.xidslot = d.xidslot
and d.addr = e.taddr
Thank you for the hint. The only concern that I have is the join in between “v$locked_object a” and “dba_rollback_segs c”. As Gary rightly noticed in this post comments the relation ship between a and c are one to many. Therefore if a transaction updates 2 objects at once you will link an UNDO data to both of the objects and there is no information how much UNDO is related to one object or another.
Just to demonstrate the point I made the following simple example:
21:35:34 [email protected]:1> create table a as select lpad('a',1000,'a') text from dba_objects where rownum create table b as select lpad('a',1000,'a') text from dba_objects where rownum @curpid
SPID SID SERIAL# USERNAME SERVER SADDR
------------------------ --------------- --------------- ------------------------------ --------- --------
6336 7 3 SYSTEM DEDICATED 2858D19C
21:37:18 [email protected]:1> update a set text = lpad('b',1000,'b') ;
1000 rows updated.
21:37:25 [email protected]:1> update b set text = lpad('b',1000,'b') ;
1000 rows updated.
21:37:29 [email protected]:1>
21:40:24 [email protected]:1> select a.os_user_name os
21:40:25 [email protected]:1> , a.oracle_username us
21:40:25 [email protected]:1> , b.owner own
21:40:25 [email protected]:1> , b.object_name obj
21:40:25 [email protected]:1> , b.object_type typ
21:40:25 [email protected]:1> , c.segment_name seg
21:40:25 [email protected]:1> , d.used_urec urec
21:40:25 [email protected]:1> , d.used_ublk ublk
21:40:25 [email protected]:1> from v$locked_object a
21:40:25 [email protected]:1> , dba_objects b
21:40:25 [email protected]:1> , dba_rollback_segs c
21:40:25 [email protected]:1> , v$transaction d
21:40:25 [email protected]:1> , v$session e
21:40:25 [email protected]:1> where a.object_id = b.object_id
21:40:25 [email protected]:1> and a.xidusn = c.segment_id
21:40:25 [email protected]:1> and a.xidusn = d.xidusn
21:40:25 [email protected]:1> and a.xidslot = d.xidslot
21:40:25 [email protected]:1> and d.addr = e.taddr
21:40:25 [email protected]:1> ;
OS US OWN OBJ TYP SEG UREC UBLK
-------------------- ---------- ---------- ---------- ------------------- ------------------------------ ---------- ----------
MYHOST\velikanov SYSTEM SYSTEM B TABLE _SYSSMU10_1955032764$ 286 286
MYHOST\velikanov SYSTEM SYSTEM A TABLE _SYSSMU10_1955032764$ 286 286
21:40:28 [email protected]:1>
I had a couple of corrections by Nuno when I last blogged on Redo/undo.
Multiple DML statements in a transaction can depend on the same undo block, so that makes it hard to allocate it to a SQL statement (which is why it makes sense in v$transaction).
Redo doesn’t even seem to belong to a transaction.
rows_processed for a delete/update/insert should give some idea (though rows can be very different sizes). Delayed block cleanout would also have an interesting effect on redo/undo. Queries could generate it without being ‘responsible’ for it.
Thanks Gary for the comments. I agree on the transaction & undo & sql relationship. However in general an SQL (I should say DML here) changes block and generates undo. IMHO: It shouldn’t bee too complex for Oracle kernel team to introduce related instrumentation. Or I am mistaken?
I think your question is very valid.
Perhaps for item 2 we could the objects in the SQL with dba_dependancies to determine if they are referencing the segmentes in question.
Thanks Andy for the comment and the support.
dba_dependancies is a good hint. There are some possible issues tho: a) what is the segment used in 100+ PL/SQL objects? it will point us to nowhere than. b) what if the segment is changed from outside (SQL running from application rather than from PL/SQL)? Than this option may even mislead an investigation.
I agree. I also got interesting suggestions when I blogged on the subject (esp. logoff trigger that copies the numbers from v$sesstat).
Thank you for the comment and the reference. The approach you described in your blog is very good one in given circumstances.
However my point here is: Oracle missing instrumentation.
Is it a valid point? What do you think?
PS Just wonder if “sum(RBABYTE) as RBABYTE ,sum(RBABLK) as RBABLK from v$logmnr_contents” really gives us bytes that the statement generated in the redo log file? According to the documentation it is not
RBABLK NUMBER RBA block number within the log file
RBABYTE NUMBER RBA byte offset within the block
Hi Yury, I came across this blog when I was looking for sql level redo statistics. I think it will be great to have even though the vendor seems disagree, even though they are telling us to try some workarounds that are certain to work all the time.
Hi Yury and All,
How could I know if DDL stsmt generated more redo like CTAS
I’m an Oracle DBA that deals with day-to-day issues as well. I agree that this is a legitimate need and is missing from the database.
Having redo statistics available at the session-level is good, however it’s not enough in situations where sessions are issuing different statements and you need to pin down specifically which are causing the most redo generation.