What do I mean by ‘fix’ the the fixed object statistics? Simply gather statistics to help the optimizer.
What are ‘fixed objects’? Fixed objects are the x$ tables and associated indexes that data dictionary views are based on. In this case we are interested in the objects that make up the v$sqlstats and v$sql_bind_capture views.
If you’ve never before collected statistics on Oracle fixed object, you may be wondering why you should bother with it, as everything appears to be fine in your databases.
After seeing an example you may want to schedule a time to collect these statistics.
Searching for SQL
Quite recently I was looking for recently executed SQL, based on the most recently captured bind variables.
select sql_id, sql_fulltext from v$sqlstats where sql_id in ( select distinct sql_id from ( select sql_id, last_captured from ( select sql_id, last_captured from V$SQL_BIND_CAPTURE order by last_captured desc nulls last ) where rownum <= 20 ) )
I ran the query and was distracted for a few moments. When I next looked at the terminal session where this SQL was executing, no rows had yet been returned.
Thinking that maybe ‘SET PAUSE ON’ had been run, I pressed ENTER. Nothing.
From another session I checked for waits in v$session_wait. Nothing there either. If the session is not returning rows, and not registering and event in v$session_wait, then it must be on CPU.
This didn’t seem an ideal situation, and so I stopped the query with CTRL-C.
The next step was to run the query on a smaller and not very busy 11.2.0.2 database. This time I saw that rows were being returned, but very slowly.
So now it was time to trace the execution and find out what was going on.
alter session set tracefile_identifier=’JKSTILL’; set linesize 200 trimspool on alter session set events ‘10046 trace name context forever, level 12’; select sql_id, sql_fulltext from v$sqlstats where sql_id in ( select distinct sql_id from ( select sql_id, last_captured from ( select sql_id, last_captured from V$SQL_BIND_CAPTURE order by last_captured desc nulls last ) where rownum <= 20 ) ) / alter session set events ‘10046 trace name context off’; exit
Coming back to this several minutes later, the resulting trace file was processed with the Method R Profiler to find out just where the time was going.
The ‘SQL*Net message from client’ event can be ignored; most of that time was accumulated waiting for me to come back and exit sqlplus. While the script example shows that the 10046 trace was turned off and the session exited, I had forgot to include those two line for this first run.
No matter, as the interesting bit is the next line, ‘CPU: FETCH dbcalls’. More than 6 minutes was spent fetching a few rows, so clearly something was not quite right. The SQL plan in the profile showed what the problem was, as the execution plan was far less than optimal. The following is the execution plan from AWR data:
1 select * 2 from TABLE( 3 dbms_xplan.display_awr(sql_id => :sqlidvar, plan_hash_value => 898242479, format => ‘ALL ALLSTATS LAST’) 4* ) [email protected] SQL- / PLAN_TABLE_OUTPUT —————————————————————————————————————– SQL_ID 4h7qfxa9t1ukz ——————– select sql_id, sql_fulltext from v$sqlstats where sql_id in ( select distinct sql_id from ( select sql_id, last_captured from ( select sql_id, last_captured from V$SQL_BIND_CAPTURE order by last_captured desc nulls last ) where rownum <= 20 ) ) Plan hash value: 898242479 ——————————————————————————————- | Id | Operation | Name | E-Rows |E-Bytes| Cost (%CPU)| E-Time | ——————————————————————————————- | 0 | SELECT STATEMENT | | | | 1 (100)| | | 1 | FILTER | | | | | | | 2 | FIXED TABLE FULL | X$KKSSQLSTAT | 1 | 2023 | 0 (0)| | | 3 | VIEW | | 1 | 8 | 1 (100)| 00:00:01 | | 4 | COUNT STOPKEY | | | | | | | 5 | VIEW | | 1 | 8 | 1 (100)| 00:00:01 | | 6 | SORT ORDER BY STOPKEY| | 1 | 43 | 1 (100)| 00:00:01 | | 7 | FIXED TABLE FULL | X$KQLFBC | 1 | 43 | 0 (0)| | ——————————————————————————————- PLAN_TABLE_OUTPUT —————————————————————————————————————– Query Block Name / Object Alias (identified by operation id): ————————————————————- 1 – SEL$88122447 2 – SEL$88122447 / [email protected]$4 3 – SEL$6 / [email protected]$5 4 – SEL$6 5 – SEL$FEF91251 / [email protected]$6 6 – SEL$FEF91251 7 – SEL$FEF91251 / [email protected]$10 Note —– – Warning: basic plan statistics not available. These are only collected when: * hint ‘gather_plan_statistics’ is used for the statement or * parameter ‘statistics_level’ is set to ‘ALL’, at session or system level 39 rows selected.
While useful, this plan is not giving much information about why this took so long. If pressed I would just whip up a Bash and Awk one-liner to parse the trace files and find out where this time was going. In this case though I could just consult the Method R profile again.
Yikes! There were 106.3E6 rows returned from from X$KQLFBC.
Collecting the Fixed Object Statistics
Rather than spend time analyzing this further, it seemed that here was a clear case for collecting statistics on fixed objects in the database. The following SQL was run:
exec dbms_stats.gather_fixed_objects_stats
The next step was to rerun the query. This time it ran so quickly I wondered if it had even worked. As before, tracing had been enabled, and a profile generated from the trace. There was now quite an improvement seen in the execution plan:
The 0.0047 seconds required to return 442 rows from X$KQLFBC was quite a reduction from the previously seen time of nearly 396 seconds.
Why This Is Important
This issue came to light due to a custom query I was running. The optimizer will probably never run that same query, but it was clear that the fixed object statistics needed to be updated.
Now imagine your customers using your application; they may be waiting on the database for what seems like an eternity after pressing ENTER on a web form. And what are they waiting on? They may be waiting on the optimizer to evaluate a SQL statement and determine the best plan to use. The reason for the waiting in this case would simply be that the DBA has not taken steps to ensure the optimizer has the correct information to effectively query the database’s own metadata. Until the optimizer has the correct statistics, performance of query optimization will be sub-optimal. In a busy system this may result in mutex waits suddenly showing as a top event in AWR reports. Troubleshooting these waits can be difficult as there are many possible causes of them.
Do your customers, your database and yourself a favor – include updates of fixed tables statistics in your regular database maintenance schedule, and avoid a possible source of performance problems.
3 Comments. Leave new
” If pressed I would just whip up a Bash and Awk one-liner to parse the trace files and find out where this time was going. ”
care to share?
Hi Chris,
Here is one way to get timing info with some command line hacking.
The first step is to identify the SQL of interest in the trace file, as it may (probably) contain several SQL statements.
The following command will show us the cursor ID for the SQL statement we are interested in.
grep -B1 ‘select sql_id, sql_fulltext’ oravm1_ora_10917_JKSTILL-SLOW-SQL.trc
PARSING IN CURSOR #47722271586392 len=266 dep=0 uid=90 oct=3 lid=90 tim=1461168326148808 hv=2475747935 ad=’c12231a0′ sqlid=’4h7qfxa9t1ukz’
select sql_id, sql_fulltext
Note that method is not always 100% reliable, cursors can be closed new cursors opened with the same ID.
(if the formatting is too difficult to read, I will try again)
The following command will show if any cursor IDs are reused:
grep ‘^PARSING IN CURSOR’ oravm1_ora_10917_JKSTILL-SLOW-SQL.trc | awk ‘{ print $4 }’ | sort | uniq -c | sort -n
1 #47722266762664
1 #47722266769160
1 #47722266805040
1 #47722266828600
1 #47722266837320
1 #47722270882888
1 #47722271584600
1 #47722271586392
1 #47722271593192
1 #47722271598504
1 #47722271697480
1 #47722271709776
1 #47722271710944
1 #47722271712112
1 #47722271713352
1 #47722271717264
1 #47722271733496
1 #47722271734664
1 #47722272079784
1 #47722272166104
1 #47722272176160
1 #47722272181736
1 #47722272186728
1 #47722272195040
1 #47722272241000
1 #47722272249112
1 #47722272249688
1 #47722272255792
1 #47722272260992
1 #47722272266192
1 #47722272271824
1 #47722272277456
1 #47722272284136
1 #47722272289352
2 #47722266806248
Indeed, cursor ID #47722266806248 has been reused. The cursor we are interested in is #47722271586392, which was parsed only once in the trace file.
That command serves to illustrate the point, while the next command is more useful; it shows only the information we are interested in.
grep ‘^PARSING IN CURSOR’ oravm1_ora_10917_JKSTILL-SLOW-SQL.trc | awk ‘{ print $4 }’ | sort | uniq -c | sort -n | grep ‘#47722271586392’
1 #47722271586392
This clearly shows that cursor #47722271586392 usually always refers to the SQL of interest.
The reason to say ‘usually’ is that cursor ID may have already been open when the trace file started.
In this case that was true. The amount of prior waits on this cursor ID was miniscule, but for the sake of being more accurate, we can get the line number where the new PARSE occurs and consider only lines in the trace file following that (including the PARSE line).
grep -n ‘^PARSING IN CURSOR #47722271586392′ oravm1_ora_10917_JKSTILL-SLOW-SQL.trc
2435:PARSING IN CURSOR #47722271586392 len=266 dep=0 uid=90 oct=3 lid=90 tim=1461168326148808 hv=2475747935 ad=’c12231a0′ sqlid=’4h7qfxa9t1ukz’
And so lines prior to #2435 will be ignored.
There are some lines that affect the cursor, but do not include the cursor ID.
LOBREAD, LOBARRTMPFRE and probably a few others.
Here’s a sample:
WAIT #0: nam=’SQL*Net message to client’ ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1461168326163985
LOBREAD: c=0,e=11,p=0,cr=0,cu=0,tim=1461168326163991
WAIT #0: nam=’SQL*Net message from client’ ela= 188 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1461168326164187
LOBARRTMPFRE: c=0,e=3,p=0,cr=0,cu=0,tim=1461168326164203
As we are just trying to get a rough idea of the timing, and not a detailed analysis, we’re not going to worry about those for now.
So we will just get FETCH times and WAIT times, with WAITS broken down by name.
Now let’s get some timing information.
When I said ‘bash and awk’, I probably should have said ‘bash and perl’. This can be done with awk, but I am better with perl for this use case.
So here’s the simple perl script filter that will sum up the times:
my %times=();
my %waits=();
while() {
my $line=$_;
chomp $line;
my ($type) = split(/\s+/,$line);
next unless $type =~ /FETCH|WAIT/;
if ( ‘WAIT’ eq $type ) {
my @a = split(/\s+/,$line);
# names are delimited with single quotes
# get posisition of 2nd quote
my $qpos1 = index($line,q{‘});
my $qpos2 = index($line,q{‘},$qpos1+1);
my $waitName = substr($line,$qpos1+1, $qpos2 – $qpos1 -1 );
#print “waitName: $waitName\n”;
my ($dummy, $dummy, $elapsed) = split(/\s+/,substr($line,$qpos2));
$times{$type} += $elapsed;
$waits{$waitName} += $elapsed;
} elsif ( ‘FETCH’ eq $type ) {
my @a = split(/,/,$line);
my ($dummy,$elapsed) = split(/=/, $a[1]);
$times{$type} += $elapsed;
} else {
die “Something is wrong\n”;
}
}
END {
print “FETCH Total: $times{‘FETCH’}\n”;
print “WAIT Total: $times{‘WAIT’}\n”;
foreach my $waitName ( keys %waits ) {
print “$waitName: $waits{$waitName}\n”;
}
}
Now we run it:
tail –lines=2435 oravm1_ora_10917_JKSTILL-SLOW-SQL.trc | perl p2.pl
FETCH Total: 18258
WAIT Total: 22606
db file sequential read: 14027
SQL*Net message to client: 47
SQL*Net message from client: 8154
asynch descriptor resize: 3
log file sync: 375
As per this utility, about 0.04 seconds are contained in this trace file (this was the ‘fast’ SQL Trace)
There are some readily available Perl scripts for doing this as well.
“Optimizing Oracle Performance” by Cary Millsap and Jeff Holt contains one such script.
Others can be found via Google.
All told I spent about 75 minutes on this. The time would have been considerably less without the explanatory text.
Great post and thanks for sharing steps for hacking the trace file