So you have created your standby database using the
RMAN DUPLICATE command, you have set the
ARCHIVE_LAG_TARGET to maintain a minimum lag target, and you have sorted out those nasty datafile missing errors using automatic file management. You’ve even added standby redo logs to improve the Mean Time To Recovery (MTTR).
Now management are demanding to know why the standby periodically lags during the day. They have you on the phone and they are asking what the current redo apply rate is.
Luckily, you read the Pythian blog regularly, or like most people google for a solution or script to help out so you don’t reinvent the wheel.
Management’s first question was, does the standby redo apply rate lag during the day and if so, when. The following SQL script joins
v$dataguard_status, which contains the most recent messages from Data Guard. We join to that table to get the timestamp for when a specific archive log started to be applied.
rem Reports standby apply rate with lag rem select TIMESTAMP,completion_time "ArchTime", SEQUENCE#,round((blocks*block_size)/(1024*1024),1) "SizeM", round((TIMESTAMP-lag(TIMESTAMP,1,TIMESTAMP) OVER (order by TIMESTAMP))*24*60*60,1) "Diff(sec)", round((blocks*block_size)/1024/ decode(((TIMESTAMP-lag(TIMESTAMP,1,TIMESTAMP) OVER (order by TIMESTAMP))*24*60*60),0,1, (TIMESTAMP-lag(TIMESTAMP,1,TIMESTAMP) OVER (order by TIMESTAMP))*24*60*60),1) "KB/sec", round((blocks*block_size)/(1024*1024)/ decode(((TIMESTAMP-lag(TIMESTAMP,1,TIMESTAMP) OVER (order by TIMESTAMP))*24*60*60),0,1, (TIMESTAMP-lag(TIMESTAMP,1,TIMESTAMP) OVER (order by TIMESTAMP))*24*60*60),3) "MB/sec", round(((lead(TIMESTAMP,1,TIMESTAMP) over (order by TIMESTAMP))-completion_time)*24*60*60,1) "Lag(sec)" from v$archived_log a, v$dataguard_status dgs where a.name = replace(dgs.MESSAGE,'Media Recovery Log ','') and dgs.FACILITY = 'Log Apply Services' order by TIMESTAMP desc;
The script reports the time it took to apply the log, the size of the log, and the redo apply rate for that log.
Diff(sec)reports the actual time difference between redo logs applied on the standby.
Lag(sec)reports the time difference between when the archive log was completed on the primary and when it was applied on the standby.
Management’s second question was, how fast is the current archived redo log being applied? The shell script
mrp-recovery-rate.sh answers that question.
Basically it gets the
log_block_size and polls
v$managed_standby twice with a 30-second delay. It then returns the redo apply rate in KB/second, MB/second, and the raw bytes/second.
Note: This is a beta version — it does not handle an application rate faster than 30-seconds per log.
This script was developed on the back of the best practices document: MAA – Data Guard Redo Apply and Media Recovery Best Practices 10gR1 (PDF). Oracle provided the formula, we are providing the script.
- Metalink Note: 387343.1
- Oracle Switchover Failover Best Practices (PDF)
- Oracle Fast Start Failover Best Practices (PDF)
#!/bin/bash # mrp-recovery-rate.sh # Created: 2007-10-10 # LOG_BLOCK_SIZE=`$ORACLE_HOME/bin/sqlplus -S "/ as sysdba" <<END set pages 0 trimsp on feed off timing off time off SELECT LEBSZ FROM X\\$KCCLE WHERE ROWNUM=1; exit; END` BLOCK_BEG=`$ORACLE_HOME/bin/sqlplus -S "/ as sysdba" <<END set pages 0 trimsp on feed off timing off time off select BLOCK# from V\\$MANAGED_STANDBY where PROCESS='MRP0'; exit; END` TIME_BEG=`date +%s` echo $BLOCK_BEG sleep 30 BLOCK_END=`$ORACLE_HOME/bin/sqlplus -S "/ as sysdba" <<END set pages 0 trimsp on feed off timing off time off select BLOCK# from V\\$MANAGED_STANDBY where PROCESS='MRP0'; exit; END` TIME_END=`date +%s` echo $BLOCK_END DIFF_TIME=`expr $TIME_END - $TIME_BEG` DIFF_BLOCKS=`expr $BLOCK_END - $BLOCK_BEG` DIFF_SIZE=`expr $DIFF_BLOCKS \* $LOG_BLOCK_SIZE` DIFF_SIZE_TIME=`expr $DIFF_SIZE / $DIFF_TIME` MB=`expr 1024 \* 1024` RECOVERY_RATE=`expr $DIFF_SIZE_TIME / 1024` RECOVERY_RATE_MB=`expr $DIFF_SIZE_TIME / $MB` echo $LOG_BLOCK_SIZE echo $DIFF_TIME echo $RECOVERY_RATE echo $RECOVERY_RATE_MB
I think this is a great way of seeing how well your standby is keeping up-to-date and how it has done historically. I use v$dataguard_stats to tell me how the standby is doing, and monitor this, which shows the apply lag and the trasnport lag.
I should point out (though you probably are aware) the sql script does not work with real time apply.
Thanks for the comment.
Unfortunately the view v$dataguard_stats shows real-time/most recent stuff and it is only available from 10G onwards. Nice way to track it though. Maybe Oracle will add a history table in the next release of 11G perhaps.
Yep the SQL also doesn’t work for manual redo log apply. For example if you are using your own scripts to apply redo logs on a standby database (anything not Enterprise Edition).
[…] Paul Moen from Pythian wrote about a script he uses to keep a historical record of the apply rate. However if you want something more immediate you can maybe use a view called V$STANDBY_APPLY_SNAPSHOT. There does not appear to be much information out there on this view, and it was only recently that I stumbled across it. This is not documented in the Oracle documentationÂ in either the 10g or 11g docs. There is also nothing about this view on metalink as a search turned up no hits. The view gets a brief mention in a talk by Dataguard product manager, Larry Carpenter. […]
If you want the link to the Carpenter presentation. It is on Oracle’s website.
[…] has been discussed before by my colleague Paul Moen in his article on Oracle standby recovery rate monitoring, but I’ve made a recent discovery that makes it easier to generate both statistics on log […]
I think scrpt “Reports standby apply rate with lag” shows the completed time on standby not production , can i get a script that shows the time difference between when the archive log was completed on the primary and when it was applied on the standby