Mining the AWR to Identify Performance Trends

Posted in: Technical Track

Sometimes it’s useful to check how performance of a SQL statement changes over time. The diagnostic pack features provide some really useful information to answer these questions. The data is there, but it not always easy to retrieve it, especially if you want to see how the performance changes over time. I’ve been using three really simple scripts to retrieve this information from the AWR. These scripts help me answer the following questions:

  • How does the performance of a particular SQL change over time?
  • How do wait times of a particular wait event change over time?
  • How does a particular statistic change over time?


Please note, the scripts provided here require diagnostic pack licenses and it’s your task to make sure you have them before running the scripts.

SQL performance

I use script awr_sqlid_perf_trend.sql to check how performance of the SQL changes over time. The script summarizes the data from DBA_HIST_SQLSTAT and reports the average statistics for a single execution of the query during the reporting interval. It requires 3 input parameters:

  1. SQL ID
  2. Days to report. It will summarize all AWR snapshots starting with “trunc(sysdate)-{days to report}+1”, so if you pass “1”, it will summarize all snapshots from today, if “2” – than it’s yesterday and today are included.
  3. Interval in hours. “24” will provide one row for each day, “6” will give 4 rows a day.

Nothing shows it better than an example. Below you see how I’m checking execution statistics for sql_id fd7rrqkn1k2xb by summarizing the AWR information captured in last 2 weeks and reporting the average values for 2-day intervals. Then I’m taking a little closer look at the last 4 days for the same SQL by summarizing data over 6hour intervals. Note, the time column shows the beginning of the interval.

TIME                 EXECUTIONS ELAPSED_TIME_S_1EXEC CPU_TIME_S_1EXEC IOWAIT_S_1EXEC CLWAIT_S_1EXEC APWAIT_S_1EXEC CCWAIT_S_1EXEC ROWS_PROCESSED_1EXEC BUFFER_GETS_1EXEC  DISK_READS_1EXEC DIRECT_WRITES_1EXEC
------------------- ----------- -------------------- ---------------- -------------- -------------- -------------- -------------- -------------------- ----------------- ----------------- -------------------
16.10.2013 00:00:00         351              195.571           74.995           .097           .000           .000           .000           134417.570      21319182.291        293731.556          304434.305
18.10.2013 00:00:00         364               91.225           47.474          1.687           .000           .000           .002           141140.228      20364053.544        270107.745          273343.709
20.10.2013 00:00:00         542               20.686            9.378           .004           .000           .000           .000           146436.875       4597922.220             3.168                .000
22.10.2013 00:00:00         531               25.060           12.086           .161           .000           .000           .000           146476.605       6026729.224         23999.684           23998.859
24.10.2013 00:00:00         542               51.611           40.675          1.880           .000           .000           .000           146814.220      21620264.039        287994.862          287994.701
26.10.2013 00:00:00         534               39.949           26.688          1.050           .000           .000           .000           147099.275      14081016.607        159704.463          159704.418
28.10.2013 00:00:00         245               37.837           29.384          1.150           .000           .000           .000           147135.216      15505533.959        179244.437          179244.367
7 rows selected.

 

TIME                 EXECUTIONS ELAPSED_TIME_S_1EXEC CPU_TIME_S_1EXEC IOWAIT_S_1EXEC CLWAIT_S_1EXEC APWAIT_S_1EXEC CCWAIT_S_1EXEC ROWS_PROCESSED_1EXEC BUFFER_GETS_1EXEC  DISK_READS_1EXEC DIRECT_WRITES_1EXEC
------------------- ----------- -------------------- ---------------- -------------- -------------- -------------- -------------- -------------------- ----------------- ----------------- -------------------
26.10.2013 00:00:00          72               19.209            9.439           .000           .000           .000           .000           147076.000       4623816.597              .111                .000
26.10.2013 06:00:00          72               15.391            9.401           .000           .000           .000           .000           147086.403       4624153.819              .000                .000
26.10.2013 12:00:00          72               14.022            9.351           .000           .000           .000           .000           147099.000       4624579.639              .000                .000
26.10.2013 18:00:00          55               48.174           35.723          1.575           .000           .000           .000           147099.000      19192781.582        243584.055          243584.055
27.10.2013 00:00:00          72               76.723           43.350          2.116           .000           .000           .000           147099.000      23258326.875        314445.111          314445.111
27.10.2013 06:00:00          72               64.921           43.914          2.084           .000           .000           .000           147107.542      23258506.028        315673.000          315673.000
27.10.2013 12:00:00          72               52.567           43.383          2.041           .000           .000           .000           147116.000      23258739.403        315673.000          315673.000
27.10.2013 18:00:00          47               25.522           18.095           .523           .000           .000           .000           147117.532       9382873.851         80597.702           80597.362
28.10.2013 00:00:00          65               17.645            9.384           .000           .000           .000           .000           147120.000       4625354.262              .000                .000
28.10.2013 06:00:00          19               17.571            9.451           .000           .000           .000           .000           147122.421       4625411.263              .000                .000
28.10.2013 12:00:00           6               14.083            9.645           .000           .000           .000           .000           147208.167       4629315.167              .000                .000
28.10.2013 18:00:00          48               42.173           35.208          1.509           .000           .000           .000           147236.375      18606643.833        229433.750          229433.750
29.10.2013 00:00:00          72               53.015           43.517          2.022           .000           .000           .000           147245.125      23265547.847        314507.319          314507.083
29.10.2013 06:00:00          30               52.181           43.638          1.932           .000           .000           .000           147250.300      23265839.767        303949.000          303949.000
29.10.2013 12:00:00           5               59.576           43.836          1.177           .000           .000           .000           144049.800      23267109.200        227814.000          227814.000
15 rows selected.

I’ve checked this SQL because the users reported inconsistent performance. It can also be observed in the outputs above. Take a look! The number of rows processed during each execution of the SQL doesn’t change – it’s always around 147K, but look at the disk reads and the direct writes! These values can be around zero, but then they suddenly jump up to 300K, and when they do, the buffer gets increase too and the CPU time goes up from 9 seconds to 43. Based on the information above it looks like there could be two different execution plans involved and bind variable peeking could be causing one or the other plan to become the active plan.
Additionally you can use the same script to check how execution statistics for the same SQL change over time. Does the elapsed time increase? Do the number of processed rows or number of buffer gets per execution change?

Wait event performance

Script awr_wait_trend.sql can be used to  show the changes in wait counts and wait durations for a particular event over time. Similarly to the previous script it also requires 3 parameters, only instead of SQL ID you pass the name of the wait event. This time the data comes from DBA_HIST_SYSTEM_EVENT.

I typically use this script in two situations:

  • To check if a particular wait event performs worse when an overall performance problem is reported (usually I’m looking at IO events)
  • Illustrate how the implemented change improved the situation.

The example below shows how the performance of log file parallel write event changed over 3 weeks. On october 19th we moved the redo logs to dedicated high performance LUNs. Before that the 2 members of each redo log group were located on a saturated LUN together with all the data files.

TIME                EVENT_NAME                       TOTAL_WAITS   TOTAL_TIME_S    AVG_TIME_MS
------------------- ---------------------------- --------------- -------------- --------------
09.10.2013 00:00:00 log file parallel write              4006177      31667.591          7.905
10.10.2013 00:00:00 log file parallel write              3625342      28296.640          7.805
11.10.2013 00:00:00 log file parallel write              3483249      31032.324          8.909
12.10.2013 00:00:00 log file parallel write              3293462      33351.490         10.127
13.10.2013 00:00:00 log file parallel write              2871091      36413.925         12.683
14.10.2013 00:00:00 log file parallel write              3763916      30262.718          8.040
15.10.2013 00:00:00 log file parallel write              3018760      28262.172          9.362
16.10.2013 00:00:00 log file parallel write              3303205      31062.276          9.404
17.10.2013 00:00:00 log file parallel write              3012105      31831.491         10.568
18.10.2013 00:00:00 log file parallel write              2692697      26981.966         10.020
19.10.2013 00:00:00 log file parallel write              1038399        512.950           .494
20.10.2013 00:00:00 log file parallel write               959443        427.554           .446
21.10.2013 00:00:00 log file parallel write              1520444        606.580           .399
22.10.2013 00:00:00 log file parallel write              1618490        655.873           .405
23.10.2013 00:00:00 log file parallel write              1889845        751.216           .398
24.10.2013 00:00:00 log file parallel write              1957384        760.656           .389
25.10.2013 00:00:00 log file parallel write              2204260        853.691           .387
26.10.2013 00:00:00 log file parallel write              2205783        856.731           .388
27.10.2013 00:00:00 log file parallel write              2033199        785.785           .386
28.10.2013 00:00:00 log file parallel write              2439092        923.368           .379
29.10.2013 00:00:00 log file parallel write              2233614        840.628           .376
21 rows selected.

Visualizing the data from output like that is easy too!

Creating a graph has never been easier

System Statistics

The last script from this set is awr_stat_trend.sql. It does the same thing with the system statistics collected in DBA_HIST_SYSSTAT as previous scripts did to the performance of SQLs and wait events. The parameters are similar again – the name of the system statistic, days to report and the interval. I usually use the query to check how the redo size or the number of physical reads change over time, but there’s huge number of statistics available (638 different statistics in 11.2.0.3) and that’s why I’m sure you’ll find your own reasons to use this script.

TIME                STAT_NAME                             VALUE
------------------- ------------------------- -----------------
27.10.2013 00:00:00 redo size                        1739466208
27.10.2013 04:00:00 redo size                        2809857936
27.10.2013 08:00:00 redo size                         648511376
27.10.2013 12:00:00 redo size                         533287888
27.10.2013 16:00:00 redo size                         704832684
27.10.2013 20:00:00 redo size                         819854908
28.10.2013 00:00:00 redo size                        2226799060
28.10.2013 04:00:00 redo size                        3875182764
28.10.2013 08:00:00 redo size                        1968024072
28.10.2013 12:00:00 redo size                        1125339352
28.10.2013 16:00:00 redo size                        1067175300
28.10.2013 20:00:00 redo size                         936404908
29.10.2013 00:00:00 redo size                        1758952428
29.10.2013 04:00:00 redo size                        3949193948
29.10.2013 08:00:00 redo size                        1715444632
29.10.2013 12:00:00 redo size                        1008385144
29.10.2013 16:00:00 redo size                         544946804
17 rows selected.

redo size


AWR is a gold mine, but you need the right tools for digging. I hope the scripts will be useful for you too!
P.S. You might have noticed the scripts are published on GitHub, let me know if you find any issues using them and perhaps one day you’ll find new versions for the script.


Update (4-Nov-2013)

I’ve added the instance numbers to the outputs in all three scripts. This is how it looks now:

 INST TIME                 EXECUTIONS ELAPSED_TIME_S_1EXEC CPU_TIME_S_1EXEC IOWAIT_S_1EXEC CLWAIT_S_1EXEC APWAIT_S_1EXEC CCWAIT_S_1EXEC ROWS_PROCESSED_1EXEC BUFFER_GETS_1EXEC  DISK_READS_1EXEC DIRECT_WRITES_1EXEC
----- ------------------- ----------- -------------------- ---------------- -------------- -------------- -------------- -------------- -------------------- ----------------- ----------------- -------------------
    1 28.10.2013 00:00:00         840                 .611             .014           .595           .007           .000           .000                1.000          1085.583           128.724                .000
      30.10.2013 00:00:00        1466                 .491             .011           .479           .005           .000           .000                1.000           976.001            88.744                .000
      01.11.2013 00:00:00         542                 .798             .023           .760           .025           .000           .000                1.000           896.978           114.196                .000
      03.11.2013 00:00:00         544                 .750             .021           .719           .017           .000           .000                1.000          1098.213           134.941                .000
    2 28.10.2013 00:00:00        1638                 .498             .017           .474           .013           .000           .000                1.001           953.514            96.287                .000
      30.10.2013 00:00:00        1014                 .745             .022           .712           .019           .000           .000                1.000          1034.249           131.057                .000
      01.11.2013 00:00:00        1904                 .633             .011           .624           .002           .000           .000                1.000          1045.668           104.568                .000
      03.11.2013 00:00:00         810                 .602             .017           .581           .010           .000           .000                1.000           929.778           108.998                .000
8 rows selected.
email

Author

Want to talk with an expert? Schedule a call with our team to get the conversation started.

About the Author

Maris Elsins is an experienced Oracle Applications DBA currently working as Lead Database Consultant at The Pythian Group. His main areas of expertise are troubleshooting and performance tuning of Oracle Database and e-Business Suite systems. He is a blogger and a frequent speaker at Oracle related conferences such as UKOUG, Collaborate, Oracle OpenWorld, HotSos, and others. Maris is an Oracle ACE, an Oracle Certified Master, and a co-author of “Practical Oracle Database Appliance” (Apress, 2014). He's also a member of the board at Latvian Oracle User Group.

19 Comments. Leave new

Marcin Przepiorowski
November 1, 2013 12:03 pm

Hi,

Thanks for sharing scripts. As you said AWR/ASH is a gold mine of data and I think still not all cases are covered by Grid Control Performance Page / ASH analytic. Kyle Hailey start ashmasters repository on github – https://github.com/khailey/ashmasters and I forked it (https://github.com/pioro/ashmasters) and added some scripts I’m using for AWR as well. I’m wondering if creating one repo where people can add / maintain ASH/AWR scripts wouldn’t be a good idea.

regards,
Marcin

Reply

Hi Marcin,

I think the idea of a single ASH/AWR repo for scripts is great, but I’m not sure how to implement it. I’ve just started using GitHub and I’m still not aware of what it’s capable of.

If it’s a single repo, I would still want to remain in control of my own scripts and if someone else updates them I would like to be asked to review the change before committing it.
Otherwise it might not take long until I don’t know how to use my own scripts. Can this be done using Github? (This would probably look something like a master repository with pointers to many different scripts from different users.)

Maris.

Reply
Connor McDonald
November 3, 2013 4:44 am

Suggestion to handle multiple instances…eg for wait events:

set ver off pages 50000 lines 160 tab off
undef event_name
undef days_history
undef interval_hours
def event_name=”&1″
def days_history=”&2″
def interval_hours=”&3″
col time for a19
col EVENT_NAME for a64
col total_waits for 99999999999999
col total_time_s for 999999999.999
col avg_time_ms for 999999999.999
select to_char(time,’DD.MM.YYYY HH24:MI:SS’) time, instance_number inst, event_name, sum(delta_total_waits) total_waits, round(sum(delta_time_waited/1000000),3) total_time_s, round(sum(delta_time_waited)/decode(sum(delta_total_waits),0,null,sum(delta_total_waits))/1000,3) avg_time_ms from
(select hse.snap_id, hse.instance_number,
trunc(sysdate-&days_history+1)+trunc((cast(hs.begin_interval_time as date)-(trunc(sysdate-&days_history+1)))*24/(&interval_hours))*(&interval_hours)/24 time,
EVENT_NAME,
(lead(TOTAL_WAITS,1) over(partition by hs.STARTUP_TIME, hs.instance_number, EVENT_NAME order by hse.snap_id))-TOTAL_WAITS delta_total_waits,
(lead(TIME_WAITED_MICRO,1) over(partition by hs.STARTUP_TIME, hs.instance_number, EVENT_NAME order by hse.snap_id))-TIME_WAITED_MICRO delta_time_waited
from DBA_HIST_SYSTEM_EVENT hse, DBA_HIST_SNAPSHOT hs
where hse.snap_id=hs.snap_id
and hse.instance_number = hs.instance_number
and hs.begin_interval_time>=trunc(sysdate)-&days_history+1
and hse.EVENT_NAME like ‘&event_name’)
group by time, event_name, instance_number
order by 2, 1;

Not thoroughly checked, but hopefully you get the idea..

Cheers
Connor

Reply

Thanks Connor,

Done!
See section “Update (4-Nov-2013)” for details.

Maris

Reply

Hi Maris

awr_sqlid_perf_trend.sql still dont have inst_id output. but awr_sqlid_perf_trend_by_plan.sql has inst_id added. Which one to use ?

Reply

Hi Dipak,

Believe it or not, but some of those scripts had been written 5 years ago!
I’ll need to take a closer look at them to make the outputs are consistent (in regards to inst_id). I’ll get to that, but not immediately :)

Maris

Reply

Hi Dipak,
I updated the awr_sqlid_perf_trend.sql
It has two modes now.
1) Global DB stats: @awr_sqlid_perf_trend.sql 2v9tgrfymr06q 24 72
2) Instance breakdown (extra “i” parameter): @awr_sqlid_perf_trend.sql 2v9tgrfymr06q 24 72 i

I’ll be updating all the other scripts the same way too, but again. It will take some time :)

Maris

Reply

Thanks for sharing. Its very useful.

Reply

Useful scripts – just one caveat. Accessing DBA_HIST views means you must pay for tuning and diagnostics packs. The data and mining it is great – just there is a cost associated that many might not realize.

Reply

Hi Bert,

Only diagnostic packs are needed for these queries.
But it’s definitely worth mentioning this in the blog post, adding it in few minutes.
Thanks!

Maris

Reply
Manish Nashikkar
November 13, 2014 10:24 am

Nice and a very handy scripts to check the SQL’s trend. Thanks for sharing…

Reply

very nice share, useful scripts.

Many Thanks,

Hendra

Reply

Nice scripts !! Gives very useful information

Reply
Earl Shaffer
May 16, 2015 9:34 am

great stuff, thanks for sharing

Reply
Vinay Rajadhyaksha
August 26, 2015 1:28 am

Excellent scripts! Very useful indeed. Appreciate your sharing them with the larger community.

Reply

Thanks for the feedback, enjoy!

Reply

Thanks a lot for this scripts. Wanted to dig for some events, but the script shows negative values for TOTAL_WAITS and AVG_TIME_MS:

12:12:22 SQL> @awr_wait_trend.sql “log file parallel write” 1 1

TIME EVENT_NAME TOTAL_WAITS TOTAL_TIME_S AVG_TIME_MS
——————- ———————— ———— ————– ————–
14.12.2016 00:00:00 log file parallel write -108383 164.330 -1.516
14.12.2016 01:00:00 log file parallel write -106837 183.655 -1.719
14.12.2016 02:00:00 log file parallel write -74354 69.843 -.939
14.12.2016 03:00:00 log file parallel write -110677 107.969 -.976
14.12.2016 04:00:00 log file parallel write -101450 129.374 -1.275
14.12.2016 05:00:00 log file parallel write -109955 103.738 -.943
14.12.2016 06:00:00 log file parallel write -146166 135.371 -.926
14.12.2016 07:00:00 log file parallel write -142547 159.886 -1.122
14.12.2016 08:00:00 log file parallel write -157180 151.971 -.967
14.12.2016 09:00:00 log file parallel write -128387 115.780 -.902
14.12.2016 10:00:00 log file parallel write -166095 150.242 -.905
14.12.2016 11:00:00 log file parallel write -165515 187.227 -1.131
14.12.2016 12:00:00 log file parallel write 18.376

13 rows selected.

Elapsed: 00:00:00.02
12:12:25 SQL> @awr_wait_trend.sql “acknowledge over PGA limit” 1 1

TIME EVENT_NAME TOTAL_WAITS TOTAL_TIME_S AVG_TIME_MS
——————- ————————— ———— ————– ————–
14.12.2016 00:00:00 acknowledge over PGA limit -22216 205.764 -9.262
14.12.2016 01:00:00 acknowledge over PGA limit -31672 322.393 -10.179
14.12.2016 02:00:00 acknowledge over PGA limit -29673 250.874 -8.455
14.12.2016 03:00:00 acknowledge over PGA limit -41665 423.753 -10.170
14.12.2016 04:00:00 acknowledge over PGA limit -42159 534.482 -12.678
14.12.2016 05:00:00 acknowledge over PGA limit -37251 397.577 -10.673
14.12.2016 06:00:00 acknowledge over PGA limit -33331 372.913 -11.188
14.12.2016 07:00:00 acknowledge over PGA limit -158311 1168.185 -7.379
14.12.2016 08:00:00 acknowledge over PGA limit -40287 823.203 -20.433
14.12.2016 09:00:00 acknowledge over PGA limit -35570 361.847 -10.173
14.12.2016 10:00:00 acknowledge over PGA limit -36599 432.629 -11.821
14.12.2016 11:00:00 acknowledge over PGA limit -38034 403.924 -10.620
14.12.2016 12:00:00 acknowledge over PGA limit 60.091

13 rows selected.

Is this expected? Or did I misinterpret something?

Reply

Hi,

Ouch! It looks like I had pushed a broken version of the awr_wait_trend.sql to the repository recently.
I fixed it, try getting the latest version. I’ll also check the other scripts, but I’ll need more time for that.

Maris

Reply

Hi, thanks a lot for your immediate fix:
13:55:56 SQL> @awr_wait_trend2.sql “acknowledge over PGA limit” 1 1

TIME EVENT_NAME TOTAL_WAITS TOTAL_TIME_S AVG_TIME_MS
——————- ————————— ———— ————– ————–
14.12.2016 00:00:00 acknowledge over PGA limit 22216 238.645 10.742
14.12.2016 01:00:00 acknowledge over PGA limit 31672 326.561 10.311
14.12.2016 02:00:00 acknowledge over PGA limit 29673 311.349 10.493
14.12.2016 03:00:00 acknowledge over PGA limit 41665 455.961 10.943
14.12.2016 04:00:00 acknowledge over PGA limit 42159 450.834 10.694
14.12.2016 05:00:00 acknowledge over PGA limit 37251 406.595 10.915
14.12.2016 06:00:00 acknowledge over PGA limit 33331 363.242 10.898
14.12.2016 07:00:00 acknowledge over PGA limit 158311 1563.062 9.873
14.12.2016 08:00:00 acknowledge over PGA limit 40287 448.115 11.123
14.12.2016 09:00:00 acknowledge over PGA limit 35570 383.776 10.789
14.12.2016 10:00:00 acknowledge over PGA limit 36599 404.271 11.046
14.12.2016 11:00:00 acknowledge over PGA limit 38034 405.226 10.654
14.12.2016 12:00:00 acknowledge over PGA limit 35140 363.153 10.334
14.12.2016 13:00:00 acknowledge over PGA limit 16766 174.314 10.397

14 rows selected.

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *