Mining the AWR to Identify Performance Trends

Posted in: Technical Track

Sometimes it’s useful to check how performance of a SQL statement changes over time. The diagnostic pack features provide some really useful information to answer these questions. The data is there, but it not always easy to retrieve it, especially if you want to see how the performance changes over time. I’ve been using three really simple scripts to retrieve this information from the AWR. These scripts help me answer the following questions:

  • How does the performance of a particular SQL change over time?
  • How do wait times of a particular wait event change over time?
  • How does a particular statistic change over time?


Please note, the scripts provided here require diagnostic pack licenses and it’s your task to make sure you have them before running the scripts.

SQL performance

I use script awr_sqlid_perf_trend.sql to check how performance of the SQL changes over time. The script summarizes the data from DBA_HIST_SQLSTAT and reports the average statistics for a single execution of the query during the reporting interval. It requires 3 input parameters:

  1. SQL ID
  2. Days to report. It will summarize all AWR snapshots starting with “trunc(sysdate)-{days to report}+1”, so if you pass “1”, it will summarize all snapshots from today, if “2” – than it’s yesterday and today are included.
  3. Interval in hours. “24” will provide one row for each day, “6” will give 4 rows a day.

Nothing shows it better than an example. Below you see how I’m checking execution statistics for sql_id fd7rrqkn1k2xb by summarizing the AWR information captured in last 2 weeks and reporting the average values for 2-day intervals. Then I’m taking a little closer look at the last 4 days for the same SQL by summarizing data over 6hour intervals. Note, the time column shows the beginning of the interval.

TIME EXECUTIONS ELAPSED_TIME_S_1EXEC CPU_TIME_S_1EXEC IOWAIT_S_1EXEC CLWAIT_S_1EXEC APWAIT_S_1EXEC CCWAIT_S_1EXEC ROWS_PROCESSED_1EXEC BUFFER_GETS_1EXEC DISK_READS_1EXEC DIRECT_WRITES_1EXEC
------------------- ----------- -------------------- ---------------- -------------- -------------- -------------- -------------- -------------------- ----------------- ----------------- -------------------
16.10.2013 00:00:00 351 195.571 74.995 .097 .000 .000 .000 134417.570 21319182.291 293731.556 304434.305
18.10.2013 00:00:00 364 91.225 47.474 1.687 .000 .000 .002 141140.228 20364053.544 270107.745 273343.709
20.10.2013 00:00:00 542 20.686 9.378 .004 .000 .000 .000 146436.875 4597922.220 3.168 .000
22.10.2013 00:00:00 531 25.060 12.086 .161 .000 .000 .000 146476.605 6026729.224 23999.684 23998.859
24.10.2013 00:00:00 542 51.611 40.675 1.880 .000 .000 .000 146814.220 21620264.039 287994.862 287994.701
26.10.2013 00:00:00 534 39.949 26.688 1.050 .000 .000 .000 147099.275 14081016.607 159704.463 159704.418
28.10.2013 00:00:00 245 37.837 29.384 1.150 .000 .000 .000 147135.216 15505533.959 179244.437 179244.367

7 rows selected.

 

TIME EXECUTIONS ELAPSED_TIME_S_1EXEC CPU_TIME_S_1EXEC IOWAIT_S_1EXEC CLWAIT_S_1EXEC APWAIT_S_1EXEC CCWAIT_S_1EXEC ROWS_PROCESSED_1EXEC BUFFER_GETS_1EXEC DISK_READS_1EXEC DIRECT_WRITES_1EXEC
------------------- ----------- -------------------- ---------------- -------------- -------------- -------------- -------------- -------------------- ----------------- ----------------- -------------------
26.10.2013 00:00:00 72 19.209 9.439 .000 .000 .000 .000 147076.000 4623816.597 .111 .000
26.10.2013 06:00:00 72 15.391 9.401 .000 .000 .000 .000 147086.403 4624153.819 .000 .000
26.10.2013 12:00:00 72 14.022 9.351 .000 .000 .000 .000 147099.000 4624579.639 .000 .000
26.10.2013 18:00:00 55 48.174 35.723 1.575 .000 .000 .000 147099.000 19192781.582 243584.055 243584.055
27.10.2013 00:00:00 72 76.723 43.350 2.116 .000 .000 .000 147099.000 23258326.875 314445.111 314445.111
27.10.2013 06:00:00 72 64.921 43.914 2.084 .000 .000 .000 147107.542 23258506.028 315673.000 315673.000
27.10.2013 12:00:00 72 52.567 43.383 2.041 .000 .000 .000 147116.000 23258739.403 315673.000 315673.000
27.10.2013 18:00:00 47 25.522 18.095 .523 .000 .000 .000 147117.532 9382873.851 80597.702 80597.362
28.10.2013 00:00:00 65 17.645 9.384 .000 .000 .000 .000 147120.000 4625354.262 .000 .000
28.10.2013 06:00:00 19 17.571 9.451 .000 .000 .000 .000 147122.421 4625411.263 .000 .000
28.10.2013 12:00:00 6 14.083 9.645 .000 .000 .000 .000 147208.167 4629315.167 .000 .000
28.10.2013 18:00:00 48 42.173 35.208 1.509 .000 .000 .000 147236.375 18606643.833 229433.750 229433.750
29.10.2013 00:00:00 72 53.015 43.517 2.022 .000 .000 .000 147245.125 23265547.847 314507.319 314507.083
29.10.2013 06:00:00 30 52.181 43.638 1.932 .000 .000 .000 147250.300 23265839.767 303949.000 303949.000
29.10.2013 12:00:00 5 59.576 43.836 1.177 .000 .000 .000 144049.800 23267109.200 227814.000 227814.000

15 rows selected.

I’ve checked this SQL because the users reported inconsistent performance. It can also be observed in the outputs above. Take a look! The number of rows processed during each execution of the SQL doesn’t change – it’s always around 147K, but look at the disk reads and the direct writes! These values can be around zero, but then they suddenly jump up to 300K, and when they do, the buffer gets increase too and the CPU time goes up from 9 seconds to 43. Based on the information above it looks like there could be two different execution plans involved and bind variable peeking could be causing one or the other plan to become the active plan.
Additionally you can use the same script to check how execution statistics for the same SQL change over time. Does the elapsed time increase? Do the number of processed rows or number of buffer gets per execution change?

Wait event performance

Script awr_wait_trend.sql can be used to  show the changes in wait counts and wait durations for a particular event over time. Similarly to the previous script it also requires 3 parameters, only instead of SQL ID you pass the name of the wait event. This time the data comes from DBA_HIST_SYSTEM_EVENT.

I typically use this script in two situations:

  • To check if a particular wait event performs worse when an overall performance problem is reported (usually I’m looking at IO events)
  • Illustrate how the implemented change improved the situation.

The example below shows how the performance of log file parallel write event changed over 3 weeks. On october 19th we moved the redo logs to dedicated high performance LUNs. Before that the 2 members of each redo log group were located on a saturated LUN together with all the data files.

TIME EVENT_NAME TOTAL_WAITS TOTAL_TIME_S AVG_TIME_MS
------------------- ---------------------------- --------------- -------------- --------------
09.10.2013 00:00:00 log file parallel write 4006177 31667.591 7.905
10.10.2013 00:00:00 log file parallel write 3625342 28296.640 7.805
11.10.2013 00:00:00 log file parallel write 3483249 31032.324 8.909
12.10.2013 00:00:00 log file parallel write 3293462 33351.490 10.127
13.10.2013 00:00:00 log file parallel write 2871091 36413.925 12.683
14.10.2013 00:00:00 log file parallel write 3763916 30262.718 8.040
15.10.2013 00:00:00 log file parallel write 3018760 28262.172 9.362
16.10.2013 00:00:00 log file parallel write 3303205 31062.276 9.404
17.10.2013 00:00:00 log file parallel write 3012105 31831.491 10.568
18.10.2013 00:00:00 log file parallel write 2692697 26981.966 10.020
19.10.2013 00:00:00 log file parallel write 1038399 512.950 .494
20.10.2013 00:00:00 log file parallel write 959443 427.554 .446
21.10.2013 00:00:00 log file parallel write 1520444 606.580 .399
22.10.2013 00:00:00 log file parallel write 1618490 655.873 .405
23.10.2013 00:00:00 log file parallel write 1889845 751.216 .398
24.10.2013 00:00:00 log file parallel write 1957384 760.656 .389
25.10.2013 00:00:00 log file parallel write 2204260 853.691 .387
26.10.2013 00:00:00 log file parallel write 2205783 856.731 .388
27.10.2013 00:00:00 log file parallel write 2033199 785.785 .386
28.10.2013 00:00:00 log file parallel write 2439092 923.368 .379
29.10.2013 00:00:00 log file parallel write 2233614 840.628 .376

21 rows selected.

Visualizing the data from output like that is easy too!

Creating a graph has never been easier

System Statistics

The last script from this set is awr_stat_trend.sql. It does the same thing with the system statistics collected in DBA_HIST_SYSSTAT as previous scripts did to the performance of SQLs and wait events. The parameters are similar again – the name of the system statistic, days to report and the interval. I usually use the query to check how the redo size or the number of physical reads change over time, but there’s huge number of statistics available (638 different statistics in 11.2.0.3) and that’s why I’m sure you’ll find your own reasons to use this script.

TIME STAT_NAME VALUE
------------------- ------------------------- -----------------
27.10.2013 00:00:00 redo size 1739466208
27.10.2013 04:00:00 redo size 2809857936
27.10.2013 08:00:00 redo size 648511376
27.10.2013 12:00:00 redo size 533287888
27.10.2013 16:00:00 redo size 704832684
27.10.2013 20:00:00 redo size 819854908
28.10.2013 00:00:00 redo size 2226799060
28.10.2013 04:00:00 redo size 3875182764
28.10.2013 08:00:00 redo size 1968024072
28.10.2013 12:00:00 redo size 1125339352
28.10.2013 16:00:00 redo size 1067175300
28.10.2013 20:00:00 redo size 936404908
29.10.2013 00:00:00 redo size 1758952428
29.10.2013 04:00:00 redo size 3949193948
29.10.2013 08:00:00 redo size 1715444632
29.10.2013 12:00:00 redo size 1008385144
29.10.2013 16:00:00 redo size 544946804

17 rows selected.

redo size


AWR is a gold mine, but you need the right tools for digging. I hope the scripts will be useful for you too!
P.S. You might have noticed the scripts are published on GitHub, let me know if you find any issues using them and perhaps one day you’ll find new versions for the script.


Update (4-Nov-2013)

I’ve added the instance numbers to the outputs in all three scripts. This is how it looks now:

INST TIME EXECUTIONS ELAPSED_TIME_S_1EXEC CPU_TIME_S_1EXEC IOWAIT_S_1EXEC CLWAIT_S_1EXEC APWAIT_S_1EXEC CCWAIT_S_1EXEC ROWS_PROCESSED_1EXEC BUFFER_GETS_1EXEC DISK_READS_1EXEC DIRECT_WRITES_1EXEC
----- ------------------- ----------- -------------------- ---------------- -------------- -------------- -------------- -------------- -------------------- ----------------- ----------------- -------------------
1 28.10.2013 00:00:00 840 .611 .014 .595 .007 .000 .000 1.000 1085.583 128.724 .000
30.10.2013 00:00:00 1466 .491 .011 .479 .005 .000 .000 1.000 976.001 88.744 .000
01.11.2013 00:00:00 542 .798 .023 .760 .025 .000 .000 1.000 896.978 114.196 .000
03.11.2013 00:00:00 544 .750 .021 .719 .017 .000 .000 1.000 1098.213 134.941 .000

2 28.10.2013 00:00:00 1638 .498 .017 .474 .013 .000 .000 1.001 953.514 96.287 .000
30.10.2013 00:00:00 1014 .745 .022 .712 .019 .000 .000 1.000 1034.249 131.057 .000
01.11.2013 00:00:00 1904 .633 .011 .624 .002 .000 .000 1.000 1045.668 104.568 .000
03.11.2013 00:00:00 810 .602 .017 .581 .010 .000 .000 1.000 929.778 108.998 .000

8 rows selected.

email

Interested in working with Maris? Schedule a tech call.

About the Author

Maris Elsins is an experienced Oracle Applications DBA currently working as Lead Database Consultant at The Pythian Group. His main areas of expertise are troubleshooting and performance tuning of Oracle Database and e-Business Suite systems. He is a blogger and a frequent speaker at Oracle related conferences such as UKOUG, Collaborate, Oracle OpenWorld, HotSos, and others. Maris is an Oracle ACE, an Oracle Certified Master, and a co-author of “Practical Oracle Database Appliance” (Apress, 2014). He's also a member of the board at Latvian Oracle User Group.

19 Comments. Leave new

Marcin Przepiorowski
November 1, 2013 12:03 pm

Hi,

Thanks for sharing scripts. As you said AWR/ASH is a gold mine of data and I think still not all cases are covered by Grid Control Performance Page / ASH analytic. Kyle Hailey start ashmasters repository on github – https://github.com/khailey/ashmasters and I forked it (https://github.com/pioro/ashmasters) and added some scripts I’m using for AWR as well. I’m wondering if creating one repo where people can add / maintain ASH/AWR scripts wouldn’t be a good idea.

regards,
Marcin

Reply

Hi Marcin,

I think the idea of a single ASH/AWR repo for scripts is great, but I’m not sure how to implement it. I’ve just started using GitHub and I’m still not aware of what it’s capable of.

If it’s a single repo, I would still want to remain in control of my own scripts and if someone else updates them I would like to be asked to review the change before committing it.
Otherwise it might not take long until I don’t know how to use my own scripts. Can this be done using Github? (This would probably look something like a master repository with pointers to many different scripts from different users.)

Maris.

Reply
Connor McDonald
November 3, 2013 4:44 am

Suggestion to handle multiple instances…eg for wait events:

set ver off pages 50000 lines 160 tab off
undef event_name
undef days_history
undef interval_hours
def event_name=”&1″
def days_history=”&2″
def interval_hours=”&3″
col time for a19
col EVENT_NAME for a64
col total_waits for 99999999999999
col total_time_s for 999999999.999
col avg_time_ms for 999999999.999
select to_char(time,’DD.MM.YYYY HH24:MI:SS’) time, instance_number inst, event_name, sum(delta_total_waits) total_waits, round(sum(delta_time_waited/1000000),3) total_time_s, round(sum(delta_time_waited)/decode(sum(delta_total_waits),0,null,sum(delta_total_waits))/1000,3) avg_time_ms from
(select hse.snap_id, hse.instance_number,
trunc(sysdate-&days_history+1)+trunc((cast(hs.begin_interval_time as date)-(trunc(sysdate-&days_history+1)))*24/(&interval_hours))*(&interval_hours)/24 time,
EVENT_NAME,
(lead(TOTAL_WAITS,1) over(partition by hs.STARTUP_TIME, hs.instance_number, EVENT_NAME order by hse.snap_id))-TOTAL_WAITS delta_total_waits,
(lead(TIME_WAITED_MICRO,1) over(partition by hs.STARTUP_TIME, hs.instance_number, EVENT_NAME order by hse.snap_id))-TIME_WAITED_MICRO delta_time_waited
from DBA_HIST_SYSTEM_EVENT hse, DBA_HIST_SNAPSHOT hs
where hse.snap_id=hs.snap_id
and hse.instance_number = hs.instance_number
and hs.begin_interval_time>=trunc(sysdate)-&days_history+1
and hse.EVENT_NAME like ‘&event_name’)
group by time, event_name, instance_number
order by 2, 1;

Not thoroughly checked, but hopefully you get the idea..

Cheers
Connor

Reply

Thanks Connor,

Done!
See section “Update (4-Nov-2013)” for details.

Maris

Reply

Hi Maris

awr_sqlid_perf_trend.sql still dont have inst_id output. but awr_sqlid_perf_trend_by_plan.sql has inst_id added. Which one to use ?

Reply

Hi Dipak,

Believe it or not, but some of those scripts had been written 5 years ago!
I’ll need to take a closer look at them to make the outputs are consistent (in regards to inst_id). I’ll get to that, but not immediately :)

Maris

Reply

Hi Dipak,
I updated the awr_sqlid_perf_trend.sql
It has two modes now.
1) Global DB stats: @awr_sqlid_perf_trend.sql 2v9tgrfymr06q 24 72
2) Instance breakdown (extra “i” parameter): @awr_sqlid_perf_trend.sql 2v9tgrfymr06q 24 72 i

I’ll be updating all the other scripts the same way too, but again. It will take some time :)

Maris

Reply

Thanks for sharing. Its very useful.

Reply

Useful scripts – just one caveat. Accessing DBA_HIST views means you must pay for tuning and diagnostics packs. The data and mining it is great – just there is a cost associated that many might not realize.

Reply

Hi Bert,

Only diagnostic packs are needed for these queries.
But it’s definitely worth mentioning this in the blog post, adding it in few minutes.
Thanks!

Maris

Reply
Manish Nashikkar
November 13, 2014 10:24 am

Nice and a very handy scripts to check the SQL’s trend. Thanks for sharing…

Reply

very nice share, useful scripts.

Many Thanks,

Hendra

Reply

Nice scripts !! Gives very useful information

Reply
Earl Shaffer
May 16, 2015 9:34 am

great stuff, thanks for sharing

Reply
Vinay Rajadhyaksha
August 26, 2015 1:28 am

Excellent scripts! Very useful indeed. Appreciate your sharing them with the larger community.

Reply

Thanks for the feedback, enjoy!

Reply

Thanks a lot for this scripts. Wanted to dig for some events, but the script shows negative values for TOTAL_WAITS and AVG_TIME_MS:

12:12:22 SQL> @awr_wait_trend.sql “log file parallel write” 1 1

TIME EVENT_NAME TOTAL_WAITS TOTAL_TIME_S AVG_TIME_MS
——————- ———————— ———— ————– ————–
14.12.2016 00:00:00 log file parallel write -108383 164.330 -1.516
14.12.2016 01:00:00 log file parallel write -106837 183.655 -1.719
14.12.2016 02:00:00 log file parallel write -74354 69.843 -.939
14.12.2016 03:00:00 log file parallel write -110677 107.969 -.976
14.12.2016 04:00:00 log file parallel write -101450 129.374 -1.275
14.12.2016 05:00:00 log file parallel write -109955 103.738 -.943
14.12.2016 06:00:00 log file parallel write -146166 135.371 -.926
14.12.2016 07:00:00 log file parallel write -142547 159.886 -1.122
14.12.2016 08:00:00 log file parallel write -157180 151.971 -.967
14.12.2016 09:00:00 log file parallel write -128387 115.780 -.902
14.12.2016 10:00:00 log file parallel write -166095 150.242 -.905
14.12.2016 11:00:00 log file parallel write -165515 187.227 -1.131
14.12.2016 12:00:00 log file parallel write 18.376

13 rows selected.

Elapsed: 00:00:00.02
12:12:25 SQL> @awr_wait_trend.sql “acknowledge over PGA limit” 1 1

TIME EVENT_NAME TOTAL_WAITS TOTAL_TIME_S AVG_TIME_MS
——————- ————————— ———— ————– ————–
14.12.2016 00:00:00 acknowledge over PGA limit -22216 205.764 -9.262
14.12.2016 01:00:00 acknowledge over PGA limit -31672 322.393 -10.179
14.12.2016 02:00:00 acknowledge over PGA limit -29673 250.874 -8.455
14.12.2016 03:00:00 acknowledge over PGA limit -41665 423.753 -10.170
14.12.2016 04:00:00 acknowledge over PGA limit -42159 534.482 -12.678
14.12.2016 05:00:00 acknowledge over PGA limit -37251 397.577 -10.673
14.12.2016 06:00:00 acknowledge over PGA limit -33331 372.913 -11.188
14.12.2016 07:00:00 acknowledge over PGA limit -158311 1168.185 -7.379
14.12.2016 08:00:00 acknowledge over PGA limit -40287 823.203 -20.433
14.12.2016 09:00:00 acknowledge over PGA limit -35570 361.847 -10.173
14.12.2016 10:00:00 acknowledge over PGA limit -36599 432.629 -11.821
14.12.2016 11:00:00 acknowledge over PGA limit -38034 403.924 -10.620
14.12.2016 12:00:00 acknowledge over PGA limit 60.091

13 rows selected.

Is this expected? Or did I misinterpret something?

Reply

Hi,

Ouch! It looks like I had pushed a broken version of the awr_wait_trend.sql to the repository recently.
I fixed it, try getting the latest version. I’ll also check the other scripts, but I’ll need more time for that.

Maris

Reply

Hi, thanks a lot for your immediate fix:
13:55:56 SQL> @awr_wait_trend2.sql “acknowledge over PGA limit” 1 1

TIME EVENT_NAME TOTAL_WAITS TOTAL_TIME_S AVG_TIME_MS
——————- ————————— ———— ————– ————–
14.12.2016 00:00:00 acknowledge over PGA limit 22216 238.645 10.742
14.12.2016 01:00:00 acknowledge over PGA limit 31672 326.561 10.311
14.12.2016 02:00:00 acknowledge over PGA limit 29673 311.349 10.493
14.12.2016 03:00:00 acknowledge over PGA limit 41665 455.961 10.943
14.12.2016 04:00:00 acknowledge over PGA limit 42159 450.834 10.694
14.12.2016 05:00:00 acknowledge over PGA limit 37251 406.595 10.915
14.12.2016 06:00:00 acknowledge over PGA limit 33331 363.242 10.898
14.12.2016 07:00:00 acknowledge over PGA limit 158311 1563.062 9.873
14.12.2016 08:00:00 acknowledge over PGA limit 40287 448.115 11.123
14.12.2016 09:00:00 acknowledge over PGA limit 35570 383.776 10.789
14.12.2016 10:00:00 acknowledge over PGA limit 36599 404.271 11.046
14.12.2016 11:00:00 acknowledge over PGA limit 38034 405.226 10.654
14.12.2016 12:00:00 acknowledge over PGA limit 35140 363.153 10.334
14.12.2016 13:00:00 acknowledge over PGA limit 16766 174.314 10.397

14 rows selected.

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *