Oracle Standby Recovery Rate Monitoring

Paul Moen

January 24, 2008

So you have created your standby database using the RMAN DUPLICATE command, you have set the ARCHIVE_LAG_TARGET to maintain a minimum lag target, and you have sorted out those nasty datafile missing errors using automatic file management. You’ve even added standby redo logs to improve the Mean Time To Recovery (MTTR).

Now management are demanding to know why the standby periodically lags during the day. They have you on the phone and they are asking what the current redo apply rate is.

Luckily, you read the Pythian blog regularly, or like most people google for a solution or script to help out so you don’t reinvent the wheel.

Management’s first question was, does the standby redo apply rate lag during the day and if so, when. The following SQL script joins v$archived_log to v$dataguard_status, which contains the most recent messages from Data Guard. We join to that table to get the timestamp for when a specific archive log started to be applied.

rem Reports standby apply rate with lag
rem
select TIMESTAMP,completion_time "ArchTime",
SEQUENCE#,round((blocks*block_size)/(1024*1024),1) "SizeM",
round((TIMESTAMP-lag(TIMESTAMP,1,TIMESTAMP) OVER (order by TIMESTAMP))*24*60*60,1) "Diff(sec)",
round((blocks*block_size)/1024/ decode(((TIMESTAMP-lag(TIMESTAMP,1,TIMESTAMP)
OVER (order by TIMESTAMP))*24*60*60),0,1,
(TIMESTAMP-lag(TIMESTAMP,1,TIMESTAMP) OVER (order by TIMESTAMP))*24*60*60),1) "KB/sec",
round((blocks*block_size)/(1024*1024)/ decode(((TIMESTAMP-lag(TIMESTAMP,1,TIMESTAMP)
OVER (order by TIMESTAMP))*24*60*60),0,1,
(TIMESTAMP-lag(TIMESTAMP,1,TIMESTAMP) OVER (order by TIMESTAMP))*24*60*60),3) "MB/sec",
round(((lead(TIMESTAMP,1,TIMESTAMP) over (order by TIMESTAMP))-completion_time)*24*60*60,1) "Lag(sec)"
from v$archived_log a, v$dataguard_status dgs
where a.name = replace(dgs.MESSAGE,'Media Recovery Log ','')
and dgs.FACILITY = 'Log Apply Services'
order by TIMESTAMP desc;

The script reports the time it took to apply the log, the size of the log, and the redo apply rate for that log.

Diff(sec) reports the actual time difference between redo logs applied on the standby.
Lag(sec) reports the time difference between when the archive log was completed on the primary and when it was applied on the standby.

Management’s second question was, how fast is the current archived redo log being applied? The shell script mrp-recovery-rate.sh answers that question.

Basically it gets the log_block_size and polls v$managed_standby twice with a 30-second delay. It then returns the redo apply rate in KB/second, MB/second, and the raw bytes/second.

Note: This is a beta version — it does not handle an application rate faster than 30-seconds per log.

This script was developed on the back of the best practices document: MAA – Data Guard Redo Apply and Media Recovery Best Practices 10gR1 (PDF). Oracle provided the formula, we are providing the script.

References

Metalink Note: 387343.1
Oracle Switchover Failover Best Practices (PDF)
Oracle Fast Start Failover Best Practices (PDF)

Have Fun!
Paul.

#!/bin/bash
# mrp-recovery-rate.sh
# Created: 2007-10-10
#
LOG_BLOCK_SIZE=`$ORACLE_HOME/bin/sqlplus -S "/ as sysdba" <<END
set pages 0 trimsp on feed off timing off time off
SELECT LEBSZ FROM X\\$KCCLE WHERE ROWNUM=1;
exit;
END`

BLOCK_BEG=`$ORACLE_HOME/bin/sqlplus -S "/ as sysdba" <<END
set pages 0 trimsp on feed off timing off time off
select BLOCK# from V\\$MANAGED_STANDBY where PROCESS='MRP0';
exit;
END`

TIME_BEG=`date +%s`
echo $BLOCK_BEG

sleep 30

BLOCK_END=`$ORACLE_HOME/bin/sqlplus -S "/ as sysdba" <<END
set pages 0 trimsp on feed off timing off time off
select BLOCK# from V\\$MANAGED_STANDBY where PROCESS='MRP0';
exit;
END`

TIME_END=`date +%s`
echo $BLOCK_END

DIFF_TIME=`expr $TIME_END - $TIME_BEG`
DIFF_BLOCKS=`expr $BLOCK_END - $BLOCK_BEG`
DIFF_SIZE=`expr $DIFF_BLOCKS \* $LOG_BLOCK_SIZE`
DIFF_SIZE_TIME=`expr $DIFF_SIZE / $DIFF_TIME`
MB=`expr 1024 \* 1024`
RECOVERY_RATE=`expr $DIFF_SIZE_TIME / 1024`
RECOVERY_RATE_MB=`expr $DIFF_SIZE_TIME / $MB`

echo $LOG_BLOCK_SIZE
echo $DIFF_TIME
echo $RECOVERY_RATE
echo $RECOVERY_RATE_MB

Insight and analysis of technology and business strategy

Oracle Standby Recovery Rate Monitoring

References

Top Categories

Tell us how we can help!