In this post:
- Question and background
- My opinion
Tom Kyte, Cary Millsap, Steven Feuerstein hold a Database Guru Panel at #Kscope 11’s conference organized by ODTUG. I had a great opportunity to not just watch the Live Stream (you can see the recorded version HERE) but also ASK questions ;). Big kudos to organizers for the fantastic opportunity to ask questions via Twitter and Face Book live. I wasn’t sure if it was going to work . Surprisingly enough several of my questions had been asked and answered live. One of the questions was related to UNDO/REDO statistics. I think that it was a bit of challenge to define the question precisely having just 140 Twitter characters and therefore I decided to share my question here, give a bit of background and ask you “What is your opinion?”
Question and background
The following is the question I asked in the twitter (16 min 14sec in the recording ).
Tweet: #kscope Q to Guru Panel: What is the best way to find TOP SQLs generating the most UNDO/REDO in 11GR2?
Tom Kyte response: My first response to that would be “Why?”
Well … This questions is coming from my day to day Oracle DBA duties. On several occasions I faced situations when an Oracle database (Instance) all over the sudden started to generate significantly more REDO data then usually. This quickly may “eat” all space allocated to archive log destinations and generate many other problems quite quickly. In such cases an Oracle DBA uses any diagnostics options available to find the root cause of the problem and resolve it in as short time as possible. My question is what Oracle DB kernel instrumentation do we have in place to troubleshoot such cases?
NOTE: It might be that word “SQLs” in my original question text introduced a bit of confusion. I really should have said DML operations instead of SQL. However I think many of us use word SQL for DML operations inclusively (including Oracle in V$SQL views ;).
In my opinion we are missing a bit of instrumentation here. V$SQL, V$SQLAREA and other Oracle view are focused on Reads related statistics. This is understandable as most of operations a database is doing are data retrieval and Oracle did a great job here. However for the use case I described it would be very useful to have a set of additional columns in these views that would provide us with data changes related statistics.
What other options/instrumentation we have for finding the place in an application that suddenly started to generate a lot of data changes? Well there are several. However none of those provides us with a way to point the SQL (DML) that is a root cause of the problem. The options that I can think about are as following:
- LogMiner – there are 2 problems with this option. 1 – it takes some time to setup and run it (under time pressure we may not have such luxury) 2- surprisingly enough LogMiner views don’t provide REDO statistics :) We are left with an option to guess what statement generates most of changes based on execution count (which isn’t always a good indicator)
- V$SEGMENT_STATISTICS – provides exactly what we need with a small exception. The information is segment/table based. As soon as we determine the table we on our own to find the related SQL. In most of the cases it might not be that difficult to find SQLs that have been executed during statistics gathering time-frame and guess which one did cause the problem. However if DML operation didn’t use a table name in it’s text (e.g. based on a view) we may not be that lucky and it isn’t that simple than to find any segment related DML operations.
- V$SESSTAT – great view, but provides statistics per database session. It provides valuable information for our investigation, however it doesn’t give use precise information what SQL (module) is causing the issue. We are on our own again to guess where the problem is.
- Extended SQL Trace (10046) – :) guess what ? there is no UNDO/REDO statistics available in trace files. Even if we have found an application module and enabled trace there is still no easy way to find the SQL that generates the most REDO data.
You would say that combining all methods named above we could narrow the scope and find the root cause of issue. And you are right. In no other options available we use any diagnostics to find the root cause. However I would say that if V$SQL… views would have UNDO/REDO statistics we could point to the SQL causing the issue in no time. I think there is a place for enhancement request.
What do you think? Have you been in such situations as I described? What options did you use to find the SQL responsible for the problem? Does Oracle need to add UNDO/REDO related statistics in database kernel instrumentation?
Please do not hesitate to leave your comment for this blog post.