This has been discussed before by my colleague Paul Moen in his article on Oracle standby recovery rate monitoring, but I recently made a discovery that makes it easier to generate both statistics on log apply performance, and more useful stats too.
First, let me say that this discovery is based on my observations and has not been verified with Oracle Support nor by any insider. If you know one who can confirm this, please ask him or her.
The discovery is a hidden column in the x$kcclh
table, which is the underlying table for gv$log_history
(and therefore v$log_history
). It’s the only column not exposed in the fixed views. Why? I don’t know but it sure would be nice to have it exposed and clearly (officially) defined. The column is x$kcclh.lhtsm
. It stores the timestamp of when the log began to be applied. The same timestamp as the one printed in the alert log for the “Media Recovery Log…” message and the same timestamp as the one recorded in the v$dataguard_status.message
text column.
The advantage of accessing this information from the x$kcclh
table is that you get access to more historical data (the number of days will vary based on your redo log switch rate and the size of the control files and possibly on other factors).
The time difference between two consecutive logs, say 1 and 2, gives us the time spent applying the log 1. At least that’s the theory. Some factors that we can’t really take into account are the log delay set in the log_archive_dest_n
parameter on the primary db (it’s an optional setting), and any unexpected events, such as an MRP0 process getting hung or a FAL server having trouble to send a log, etc.
Nevertheless, I feel it is very useful information. Any anomaly can be easily identified as it would come up with an unusual elapse time.
Here’s the query I use to get the log apply elapse time:
set linesize 80 select b.lhseq sequence#, b.lhtsm fromtime, a.lhtsm totime, ((to_date(a.lhtsm,'MM/DD/RR HH24:MI:SS') - to_date(b.lhtsm,'MM/DD/RR HH24:MI:SS')) * 1440) minutes_to_apply from sys.x$kcclh a, sys.x$kcclh b where a.lhrid = b.lhrid + 1 order by a.lhrid ;
Sample output:
SEQUENCE# FROMTIME TOTIME MINUTES_TO_APPLY --------- ------------------- ------------------- ---------------- 97077 09/18/2008 03:28:32 09/18/2008 03:43:31 14.9833333 97078 09/18/2008 03:43:31 09/18/2008 03:58:31 15 97079 09/18/2008 03:58:31 09/18/2008 04:13:31 15 97080 09/18/2008 04:13:31 09/18/2008 04:28:31 15 97081 09/18/2008 04:28:31 09/18/2008 04:43:33 15.0333333 97082 09/18/2008 04:43:33 09/18/2008 04:58:33 15 97083 09/18/2008 04:58:33 09/18/2008 05:13:36 15.05 97084 09/18/2008 05:13:36 09/18/2008 05:28:33 14.95 97085 09/18/2008 05:28:33 09/18/2008 05:43:31 14.9666667 97086 09/18/2008 05:43:31 09/18/2008 05:58:31 15
Here’s a query to show the min, max and average elapse time:
set linesize 140 col min_log_date newline col max_log_date newline col max_log_apply_duration for a30 newline col min_log_apply_duration for a30 newline col avg_log_apply_duration for a30 newline select min(lhlot) min_log_date, max(lhlot) max_log_date, numtodsinterval(max(to_number(a_lhtsm - b_lhtsm)),'DAY') max_log_apply_duration, numtodsinterval(min(to_number(a_lhtsm - b_lhtsm)),'DAY') min_log_apply_duration, numtodsinterval(avg(to_number(a_lhtsm - b_lhtsm)),'DAY') avg_log_apply_duration from ( select a.lhlot lhlot, to_date(a.lhtsm,'MM/DD/RR HH24:MI:SS') a_lhtsm, to_date(b.lhtsm,'MM/DD/RR HH24:MI:SS') b_lhtsm, a.lhrid from sys.x$kcclh a, sys.x$kcclh b where a.lhrid = b.lhrid + 1 ) order by lhrid;
Sample output:
MIN_LOG_DATE -------------------- MAX_LOG_DATE -------------------- MAX_LOG_APPLY_DURATION ------------------------------ MIN_LOG_APPLY_DURATION ------------------------------ AVG_LOG_APPLY_DURATION ------------------------------ 09/18/2008 04:28:30 10/21/2008 15:06:56 +000000000 04:16:55.999999999 +000000000 00:00:05.999999999 +000000000 00:11:47.666911225
The following query shows logs that took over 30 minutes to apply:
select a.lhseq, b.lhtsm fromtime, a.lhtsm totime, ((to_date(a.lhtsm,'MM/DD/RR HH24:MI:SS') - to_date(b.lhtsm,'MM/DD/RR HH24:MI:SS')) * 1440) minutes_to_apply from sys.x$kcclh a, sys.x$kcclh b where to_date(a.lhlot,'MM/DD/RR HH24:MI:SS') > sysdate - 60 and a.lhrid = b.lhrid + 1 and ((to_date(a.lhtsm,'MM/DD/RR HH24:MI:SS') - to_date(b.lhtsm,'MM/DD/RR HH24:MI:SS')) * 1440) > 30 order by a.lhrid ;
Sample output:
LHSEQ FROMTIME TOTIME MINUTES_TO_APPLY ------ -------------------- -------------------- ---------------- 98276 09/28/2008 08:04:41 09/28/2008 11:04:41 180 98278 09/28/2008 11:25:24 09/28/2008 12:07:06 41.7 98280 09/28/2008 12:15:35 09/28/2008 14:23:01 127.433333 98294 09/28/2008 14:33:29 09/28/2008 15:13:33 40.0666667 98297 09/28/2008 15:51:18 09/28/2008 16:31:42 40.4 98423 09/28/2008 19:36:32 09/28/2008 20:20:56 44.4 98533 09/28/2008 22:16:22 09/28/2008 22:54:11 37.8166667 98662 09/29/2008 02:29:11 09/29/2008 03:37:19 68.1333333 98663 09/29/2008 03:37:19 09/29/2008 04:13:37 36.3 98664 09/29/2008 04:13:37 09/29/2008 04:44:14 30.6166667 98669 09/29/2008 05:54:33 09/29/2008 07:19:25 84.8666667 100220 10/13/2008 07:50:41 10/13/2008 08:50:27 59.7666667 100422 10/15/2008 06:10:58 10/15/2008 10:27:54 256.933333 100537 10/16/2008 08:40:24 10/16/2008 10:05:32 85.1333333
I hope you find these queries useful for monitoring your standby database log apply rate and efficiency.
Enjoy!
Marc.
8 Comments. Leave new
Hi Marc,
Good findings!!!
This will definitely be helpful when monitoring your standby databases. I will save these queries in my favorite one’s.
I have recently been advocating using Analytic Functions in SQL to make life easier, so thought of re-writing your queries using AF.
Query 1:
========
select sequence#, fromtime, totime,
((to_date(totime,’MM/DD/RR HH24:MI:SS’)
– to_date(fromtime,’MM/DD/RR HH24:MI:SS’)) * 1440) minutes_to_apply
from (select lhseq sequence#, lhtsm fromtime,
lead(lhtsm) over (order by lhseq) totime
from sys.x$kcclh)
order by sequence#
Query 2:
========
select min(lhlot) min_log_date, max(lhlot) max_log_date,
numtodsinterval(max(to_number(a_lhtsm – b_lhtsm)),’DAY’) max_log_apply_duration,
numtodsinterval(min(to_number(a_lhtsm – b_lhtsm)),’DAY’) min_log_apply_duration,
numtodsinterval(avg(to_number(a_lhtsm – b_lhtsm)),’DAY’) avg_log_apply_duration
from ( select lhlot lhlot,
to_date(lhtsm,’MM/DD/RR HH24:MI:SS’) a_lhtsm,
to_date(lag(lhtsm) over (order by lhrid),’MM/DD/RR HH24:MI:SS’) b_lhtsm,
lhrid
from sys.x$kcclh )
order by lhrid;
Query 3:
========
select sequence#, fromtime, totime,
((to_date(totime,’MM/DD/RR HH24:MI:SS’)
– to_date(fromtime,’MM/DD/RR HH24:MI:SS’)) * 1440) minutes_to_apply
from (select lhseq sequence#, lhtsm totime,
lag(lhtsm) over (order by lhseq) fromtime
from sys.x$kcclh)
where to_date(fromtime,’MM/DD/RR HH24:MI:SS’) > sysdate – 60
and ((to_date(totime,’MM/DD/RR HH24:MI:SS’) –
to_date(fromtime,’MM/DD/RR HH24:MI:SS’)) * 1440) > 30
order by sequence#;
Thanks again Marc for the findings.
Regards
Asif Momen
https://momendba.blogspot.com
Thanks Marc, very informative & useful. Do you have a shell script for the query “”The following query shows logs that took over 30 minutes to apply:”
which should trigger an email if the logs that took over 30 minutes
Appreciate your help.
Interesting stuff, and good spot!
Output may be of less use in real time apply.
jason.
[…] Billette brings some interesting information about how you can monitor the amount of time it takes Oracle to apply logs. Jason Arneil has some very noteworthy work showing how performing an online rebalance of ASM can […]
agreed with Jason. ouput may less use in real time apply.
but is it x$kcclh.lhtsm can be used to monitor primary instnace local archiver?
“oracle dba”, I don’t know if that column is useful on the primary db side. You’d have to do your own observations. I do see that the column is populated on the primary but it’s obviously not for standby log apply. Perhaps the column has double use, and that could explain why it is not exposed in the virtual views…
Note that I’m not clear on what you would like to know about the archivers on the primary database.
great stuff about x$kcclh.lhtsm. This was info i needed and saved me plenty of time and work parsing alert to get the same data. excellent post.
I too was frustrated by the lack of exposure of the apply rate to the standby.
I wrote this script years ago using more common views and extrapolated the necessary information.
rem Script: get_redo_apply_metrics.sql
rem Reports standby apply rate with lag
rem
set lines 120 pages 60
column SizeM format 9,999.9
column “MB/sec” format 9,999.99
select TIMESTAMP,
completion_time “ArchTime”,
SEQUENCE#,
round((blocks*block_size)/(1024*1024),1) “SizeM”,
round((TIMESTAMP-lag(TIMESTAMP,1,TIMESTAMP) OVER (order by TIMESTAMP))*24*60*60,1) “Diff(sec)
“,
round((blocks*block_size)/1024/ decode(((TIMESTAMP-lag(TIMESTAMP,1,TIMESTAMP)
OVER (order by TIMESTAMP))*24*60*60),0,1,
(TIMESTAMP-lag(TIMESTAMP,1,TIMESTAMP) OVER (order by TIMESTAMP))*24*60*60),1) “KB/sec”,
round((blocks*block_size)/(1024*1024)/ decode(((TIMESTAMP-lag(TIMESTAMP,1,TIMESTAMP)
OVER (order by TIMESTAMP))*24*60*60),0,1,
(TIMESTAMP-lag(TIMESTAMP,1,TIMESTAMP) OVER (order by TIMESTAMP))*24*60*60),3) “MB/sec”,
round(((lead(TIMESTAMP,1,TIMESTAMP) over (order by TIMESTAMP))-completion_time)*24*60*60,1) ”
Lag(sec)”
from v$archived_log a, v$dataguard_status dgs
where a.name = replace(dgs.MESSAGE,’Media Recovery Log ‘,”)
and dgs.FACILITY = ‘Log Apply Services’
order by TIMESTAMP;