Oracle Standby Recovery Rate Monitoring

Posted in: Technical Track

So you have created your standby database using the RMAN DUPLICATE command, you have set the ARCHIVE_LAG_TARGET to maintain a minimum lag target, and you have sorted out those nasty datafile missing errors using automatic file management. You’ve even added standby redo logs to improve the Mean Time To Recovery (MTTR).

Now management are demanding to know why the standby periodically lags during the day. They have you on the phone and they are asking what the current redo apply rate is.

Luckily, you read the Pythian blog regularly, or like most people google for a solution or script to help out so you don’t reinvent the wheel.

Management’s first question was, does the standby redo apply rate lag during the day and if so, when. The following SQL script joins v$archived_log to v$dataguard_status, which contains the most recent messages from Data Guard. We join to that table to get the timestamp for when a specific archive log started to be applied.

rem Reports standby apply rate with lag
rem
select TIMESTAMP,completion_time "ArchTime",
SEQUENCE#,round((blocks*block_size)/(1024*1024),1) "SizeM",
round((TIMESTAMP-lag(TIMESTAMP,1,TIMESTAMP) OVER (order by TIMESTAMP))*24*60*60,1) "Diff(sec)",
round((blocks*block_size)/1024/ decode(((TIMESTAMP-lag(TIMESTAMP,1,TIMESTAMP)
OVER (order by TIMESTAMP))*24*60*60),0,1,
(TIMESTAMP-lag(TIMESTAMP,1,TIMESTAMP) OVER (order by TIMESTAMP))*24*60*60),1) "KB/sec",
round((blocks*block_size)/(1024*1024)/ decode(((TIMESTAMP-lag(TIMESTAMP,1,TIMESTAMP)
OVER (order by TIMESTAMP))*24*60*60),0,1,
(TIMESTAMP-lag(TIMESTAMP,1,TIMESTAMP) OVER (order by TIMESTAMP))*24*60*60),3) "MB/sec",
round(((lead(TIMESTAMP,1,TIMESTAMP) over (order by TIMESTAMP))-completion_time)*24*60*60,1) "Lag(sec)"
from v$archived_log a, v$dataguard_status dgs
where a.name = replace(dgs.MESSAGE,'Media Recovery Log ','')
and dgs.FACILITY = 'Log Apply Services'
order by TIMESTAMP desc;

The script reports the time it took to apply the log, the size of the log, and the redo apply rate for that log.

  • Diff(sec) reports the actual time difference between redo logs applied on the standby.
  • Lag(sec) reports the time difference between when the archive log was completed on the primary and when it was applied on the standby.

Management’s second question was, how fast is the current archived redo log being applied? The shell script mrp-recovery-rate.sh answers that question.

Basically it gets the log_block_size and polls v$managed_standby twice with a 30-second delay. It then returns the redo apply rate in KB/second, MB/second, and the raw bytes/second.

Note: This is a beta version — it does not handle an application rate faster than 30-seconds per log.

This script was developed on the back of the best practices document: MAA – Data Guard Redo Apply and Media Recovery Best Practices 10gR1 (PDF). Oracle provided the formula, we are providing the script.

References

  • Metalink Note: 387343.1
  • Oracle Switchover Failover Best Practices (PDF)
  • Oracle Fast Start Failover Best Practices (PDF)

Have Fun!
Paul.

#!/bin/bash
# mrp-recovery-rate.sh
# Created: 2007-10-10
#
LOG_BLOCK_SIZE=`$ORACLE_HOME/bin/sqlplus -S "/ as sysdba" <<END
set pages 0 trimsp on feed off timing off time off
SELECT LEBSZ FROM X\\$KCCLE WHERE ROWNUM=1;
exit;
END`

BLOCK_BEG=`$ORACLE_HOME/bin/sqlplus -S "/ as sysdba" <<END
set pages 0 trimsp on feed off timing off time off
select BLOCK# from V\\$MANAGED_STANDBY where PROCESS='MRP0';
exit;
END`

TIME_BEG=`date +%s`
echo $BLOCK_BEG

sleep 30

BLOCK_END=`$ORACLE_HOME/bin/sqlplus -S "/ as sysdba" <<END
set pages 0 trimsp on feed off timing off time off
select BLOCK# from V\\$MANAGED_STANDBY where PROCESS='MRP0';
exit;
END`

TIME_END=`date +%s`
echo $BLOCK_END

DIFF_TIME=`expr $TIME_END - $TIME_BEG`
DIFF_BLOCKS=`expr $BLOCK_END - $BLOCK_BEG`
DIFF_SIZE=`expr $DIFF_BLOCKS \* $LOG_BLOCK_SIZE`
DIFF_SIZE_TIME=`expr $DIFF_SIZE / $DIFF_TIME`
MB=`expr 1024 \* 1024`
RECOVERY_RATE=`expr $DIFF_SIZE_TIME / 1024`
RECOVERY_RATE_MB=`expr $DIFF_SIZE_TIME / $MB`

echo $LOG_BLOCK_SIZE
echo $DIFF_TIME
echo $RECOVERY_RATE
echo $RECOVERY_RATE_MB
email
Want to talk with an expert? Schedule a call with our team to get the conversation started.

About the Author

Database Specialist based in Sydney,Australia

6 Comments. Leave new

I think this is a great way of seeing how well your standby is keeping up-to-date and how it has done historically. I use v$dataguard_stats to tell me how the standby is doing, and monitor this, which shows the apply lag and the trasnport lag.

I should point out (though you probably are aware) the sql script does not work with real time apply.

Reply

Hi Jason,

Thanks for the comment.

Unfortunately the view v$dataguard_stats shows real-time/most recent stuff and it is only available from 10G onwards. Nice way to track it though. Maybe Oracle will add a history table in the next release of 11G perhaps.

Yep the SQL also doesn’t work for manual redo log apply. For example if you are using your own scripts to apply redo logs on a standby database (anything not Enterprise Edition).

Have Fun

Paul

Reply
Determining Dataguard Standby Apply Rate « jarneil
February 27, 2008 10:59 am

[…] Paul Moen from Pythian wrote about a script he uses to keep a historical record of the apply rate. However if you want something more immediate you can maybe use a view called V$STANDBY_APPLY_SNAPSHOT. There does not appear to be much information out there on this view, and it was only recently that I stumbled across it. This is not documented in the Oracle documentation  in either the 10g or 11g docs. There is also nothing about this view on metalink as a search turned up no hits. The view gets a brief mention in a talk by Dataguard product manager, Larry Carpenter. […]

Reply

If you want the link to the Carpenter presentation. It is on Oracle’s website.

https://www.oracle.com/technology/deploy/availability/pdf/S938_Carpenter.ppt.pdf

Reply
Standby Log Apply Elapsed Time
October 24, 2008 1:19 pm

[…] has been discussed before by my colleague Paul Moen in his article on Oracle standby recovery rate monitoring, but I’ve made a recent discovery that makes it easier to generate both statistics on log […]

Reply
kalyan chakravarthy
November 21, 2008 11:36 am

Hi,

I think scrpt “Reports standby apply rate with lag” shows the completed time on standby not production , can i get a script that shows the time difference between when the archive log was completed on the primary and when it was applied on the standby

Regards
Kalyan

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *