Data Guard Cheatsheet
currently being updated, this statement will be removed when I have completed this section
Terminology
Primary database | A production database |
Standby database | A database that can become the primary database, should the primary fail |
EOR | End Of Redo |
LWGR | Log Writer process |
LNS | Log Network Server |
ORL | Online Redo Log |
RFS | Remote File Server |
SRL | Standby Redo Log file |
SYNC and ASYNC | Synchronous and Asynchronous |
Log Files
DG alert Log | drc<db_unique_name>.log |
Alert Log | alert_<SID>.log |
Logfile locations | # change the instance name to reflect the one you have choosen and the path you installed oracle prod1 (alert log): /u01/app/oracle/diag/rdbms/prod1/PROD1/trace/alert_PROD1.log prod1 (DG log): /u01/app/oracle/diag/rdbms/prod1/PROD1/trace/drcPROD1.log prod1dr (alert log): /u01/app/oracle/diag/rdbms/prod1dr/PROD1DR/trace/alert_PROD1DR.log |
identify log files | ## You can get the log locations from the below view col name for a25 col value for a65; select name, value from v$diag_info; |
Data Guard Broker
Create base configuration | # Primary Database server |
Add the standby database | # Primary Database server - if you have setup db_unique_name, tnsname and log_archive_dest_n Database "prod1dr" added |
Display configuration | DGMGRL> show configuration |
Display Database | DGMGRL> show database verbose prod1 |
Enabling the configuration | # Primary Database server DGMGRL> enable configuration Enabled. |
Edit configuration | EDIT CONFIGURATION SET PROPERTY <name>=<value> There are many options see the broker section for more information |
Troubleshooting (Monitoring commands and log files) |
|
configuration | DGMGRL> show configuration; |
database | DGMGRL> show database prod1; DGMGRL> show database prod1dr; # There are a number of specific information commands, here are the most used DGMGRL> show database prod1 statusreport; DGMGRL> show database prod1 inconsistentProperties; DGMGRL> show database prod1 inconsistentlogxptProps; DGMGRL> show database prod1 logxptstatus; DGMGRL> show database prod1 latestlog; |
Logfiles | # change the instance name to reflect the one you have choosen prod1 (alert log): /u01/app/oracle/diag/rdbms/prod1/PROD1/trace/alert_PROD1.log prod1 (DG log): /u01/app/oracle/diag/rdbms/prod1/PROD1/trace/drcPROD1.log prod1dr (alert log): /u01/app/oracle/diag/rdbms/prod1dr/PROD1DR/trace/alert_PROD1DR.log |
There are a number of commands that you can use to change the state of the database
turn off/on the redo transport service for all standby databases | Primary |
DGMGRL> edit database prod1 set state=transport-off; DGMGRL> edit database prod1 set state=transport-on; |
turn off/on the apply state | Standby |
DGMGRL> edit database prod1dr set state=apply-off; |
put a database into a real-time query mode | Standby |
DGMGRL> edit database prod1dr set state=apply-off; sql> alter database open read only; DGMGRL> edit database prod1dr set state=apply-on; |
change the protection mode | Primary |
# Choose what level of protection you require # display the configuration |
Redo Processing
Redo Processes (Primary and Standby Databases) |
|
Processes | There are a number of Oracle background processes that play a key role, first the primary database
The standby database will also have key processes
|
Real-Time Apply |
|
Enable real-time apply | sql> alter database recover managed standby database using current logfile disconnect; |
Determine if real-time apply is enabled | sql> select recovery_mode from v$archive_dest_status where dest_id = 2; RECOVERY_MODE -------------------------- MANAGED REAL-TIME APPLY |
Tools and views to monitor redo |
|
Background processes | select process, client_process, thread#, sequence#, status from v$managed_standby; ## primary (example) |
Information on Redo Data | select * from v$dataguard_stats; Note: this indirectly shows how much redo data could be lost if the primary db crashes |
Redo apply rate | select to_char(snapshot_time, 'dd-mon-rr hh24:mi:ss') snapshot_time, Note: this command can only run when the database is open |
Recovery operations | select to_char(start_time, 'dd-mon-rr hh24:mi:ss') start_time, item, round(sofar/1024,2) "MB/Sec" from v$recovery_progress where (item='Active Apply Rate' or item='Average Apply Rate'); |
Logical Standby
schema that are not maintained by SQL apply | select owner from dba_logstdby_skip where statement_opt = 'INTERNAL SCHEMA' order by owner; Note: system and sys schema are not replicated so don't go creating tables in these schemas, the above command should return about 17 schemas (Oracle 11g) that are replicated. |
Check tables with unsupported data types | select distinct owner, table_name from dba_logstdby_unsupported; |
skip replication of tables | ## Syntax |
revoke a skipped table | stop SQL apply Note: the dblink should point to the primary database, we have to stop SQL apply as the instantiate table procedure uses Oracle's data pump network interface to lock the source table to obtain the SCN at the primary database, it then releases the lock and gets a consistent snapshot of the table from the primary database, it remembers the SCN associated with the consistent snapshot. |
display what tables are being skipped | select owner, name, use_like, esc from dba_logstdby_skip where statement_opt = 'DML'; |
setting the guard on a database | alter database guard standby; |
Inside SQL Apply |
|
List the above processes | select * from v$logstdby_process |
Increase the LCR cache size | # Set the cache size to 200MB execute dbms_logstdby.apply_set('MAX_SGA', 200); |
How much LCR cache is being used | select used_memory_size from v$logmnr_session where session_id = (select value from v$logstdby_stats where name = 'SESSION_ID'); |
setting SQL apply mode for the application | execute dbms_logstdby.apply_set (name => 'PRESERVE_COMMIT_ORDER', value => FALSE); |
Determine the number of DDL statements since the last restart | select name, value from v$logstdby_stats where name = 'DDL TXNS DELIVERED'; NAME VALUE ------------------------------------------------------------------------ DDL TXNS DELIVERED 510 |
displaying the barrier | select status_code as sc, status from v$logstdby_process where type = 'BUILDER'; sc status ------------------------------------------------------------------------------------- 44604 BARRIER SYNCHRONIZATION ON DDL WITH XID 1.15.256 (WAITING ON 17 TRANSACTIONS) |
Tuning SQL Apply |
|
MAX_SERVERS | # Set the MAX_SERVERS to 8 x the number of cores execute dbms_logstdby.apply_set ('MAX_SERVERS', 64); |
MAX_SGA | # Set the MAX_SGA to 200MB execute dbms_logstdby.apply_set ('MAX_SGA', 200); |
_HASH_TABLE_SIZE | # Set the Hash table size to 10 million execute dbms_logstdby.apply_set ('_HASH_TABLE_SIZE', 10000000); |
DDL | defer DDLs to off-peak hours |
Preserve commit order | # Set the PERSERVE_COMMIT_ORDER to false execute dbms_logstdby.apply_set (name => 'PRESERVE_COMMIT_ORDER', value => FALSE); |
lagging SQL Apply | # apply lag: indicates how current the replicated data at the logical standby is |
SQL Apply component bottleneck | select name, value from v$logstdby_stats where name like 'TRASNACTIONS%'; Name Value ----------------------------------------------------------------------------------------------------- TRANSACTIONS APPLIED 3764 TRANSACTIONS MINED 4985 The mined transactions should be about twice the applied transaction, if this decreases or staying at a low value you need to start looking at the mining engine. |
Make sure all preparers are busy | select count(1) as idle_preparers from v$logstdby_process where type = 'PREPARER' and STATUS_CODE = 16166; IDLE_PREPARER |
Make sure the peak size is well below the amount allocated | select used_memory_size from v$logstdby_session where session_id = (select value from v$logstdby_stats where name = 'LOGMINER SESSION ID'); USED_MEMORY_SIZE ---------------------------- 32522244 |
verify that the preparer does not have enough work for the applier processes | select (available_txn - pinned_txn) as pipleline_depth from v$logstdby_session where session_id (select value from v$lostdby_stats where name = 'LOGMINER SESSION ID'); PIPELINE_DEPTH |
Setting max_servers and preparers | execute dbms_logstdby.apply_set('MAX_SERVERS', 36); execute dbms_logstdby.apply_set('PREPARE_SERVERS', 3); |
display the pageout activity | ## Run this first select name, value from v$logstdby_stats where name line '%PAGE%' or name like '%UPTIME' or name like '%IDLE%'; ## Run the second time about 10 mins later select name, value from v$logstdby_stats where name line '%PAGE%' or name like '%UPTIME' or name like '%IDLE%'; Now subtract one from the other and work out the percentage rate, if pageout has increase above 5% then increase the MAX_SERVERS |
unassigned large transactions | ## By default SQL apply should be one-sixth of the number of applier processes PIPELINE_DEPTH select count(1) as idle_applier from v$logstdby_process where type = 'APPLIER' and statuscode = 16166; VALUE |
Monitoring
archive gap logs | # Use the thread# when using RAC an detect missing sequences select thread#, low_sequence#, high_sequence# from v$archive_gap; |
delays in redo transport | select max(sequence#), thread# from v$archived_log group by thread#; ## you can use the dg_archivelog_monitor.sh script, which accepts three parameters, primary, physical |
Identify the missing logs on the primary | ## On the primary run the below |
apply rate and active monitoring | select to_char(start_time, 'DD-MON-RR HH24:MI:SS') start_time, item , sofar from v$recovery_progress |
transport and apply lag | col name for a13 |
Viewing the status of the managed recovery process | col client_pid for a10; select pid, process, status, client_process, client_pid, thread#, sequence#, block#, blocks from v$managed_standby; |
Switchover, Failover and FSFO
Complete Switchover | ## Start the switcover on the original primary alter database commit to switchover to standby; ## On the new primary complete the switchover alter database commit to switchover to primary; ## Now open the database on the new primary alter database open; |
Complete Failover | ## Start the failover # Change the level of protection that you require |
Broker switchover | DGMGRL> switchover to prod1lr |
Action | Step |
Commands |
check redo has been received | 1 |
## check the syn status, it should say yes (run on the standby) ## if it says NO then lets make further checks (run on the standby) Note: if using a RAC environment make sure you check each instance |
check that redo has been applied (physical) | 2 |
## check that MRP (applying_log) matches the RFS process, if the MRP line is missing then you need to |
check that redo has been applied (logical) | 3 |
## if you are using a logical standby then you need to check the following to confirm the redo has been ## applied sql> select applied_scn, latest_scn, mining_scn from v$logstdby_progress; ## if the mining scn is behind you may have a gap check this by using the following sql> select status from v$logstdby_process where type = 'READER'; |
show any running jobs or backups | 4 |
sql> select process, operation, r.status, mbytes_processed pct, s.status from v$rman_status r, v$session s where r.sid=s.sid; |
increase logging level (if required) | 5 |
sql> alter system set log_archive_trace=8129; ## to turn it off again sql> alter system set log_archive_trace=0; |
check for active sessions | 6 |
## Display the active sessions sql> select program, type from v$session where type='USER'; |
check the switchover status | 7 |
## make sure the status is "to standby", if you get "sessions active", then stop those sessions (see above command to identify ## sessions) sql> select switchover_status from v$database; |
tail the log alert log file | 8 |
tail alert??.log |
switchover (primary) | 9 |
## on the primary, after this command completes you will have two physical standbys sql> alter database commit to switchover to physical standby with session shutdown; Note: at this point if you want to rollback this switchover see my troubleshooting section to get ot back to normal |
check the switchover status | 10 |
sql> select switchover_status from v$database; |
complete the switchover (physical) | 11 |
sql> alter database commit to switchover to primary with session shutdown; |
open the new primary | 12 |
sql> alter database open; |
finish off the old primary | 13 |
sql> shutdown immediate; sql> startup mount; sql> alter database recover managed standby database using current logfile disconnect; |
Action | Step |
Commands |
check redo has been received | 1 |
## check the syn status, it should say yes (run on the standby) ## if it says NO then lets make further checks (run on the standby) Note: if using a RAC environment make sure you check each instance |
check that redo has been applied (physical) | 2 |
## check that MRP (applying_log) matches the RFS process, if the MRP line is missing then you need to |
check that redo has been applied (logical) | 3 |
## if you are using a logical standby then you need to check the following to confirm the redo has been ## applied sql> select applied_scn, latest_scn, mining_scn from v$logstdby_progress; ## if the mining scn is behind you may have a gap check this by using the following sql> select status from v$logstdby_process where type = 'READER'; |
show any running jobs or backups | 4 |
sql> select process, operation, r.status, mbytes_processed pct, s.status from v$rman_status r, v$session s where r.sid=s.sid; |
increase logging level (if required) | 5 |
sql> alter system set log_archive_trace=8129; ## to turn it off again sql> alter system set log_archive_trace=0; |
check for active sessions | 6 |
## Display the active sessions sql> select program, type from v$session where type='USER'; |
check the switchover status | 7 |
## make sure the status is "to standby", if you get "sessions active", then stop those sessions (see above command to identify ## sessions) sql> select switchover_status from v$database; |
tail the log alert log file | 8 |
tail alert??.log |
Prepare the primary standby | 9 |
sql> alter database prepare to switchover to logical standby; ## confirm that the prepare has started to happen, you should now see "preparing switchover" sql> select switchover_status from v$database; |
Prepare the logical standby | 10 |
sql> alter database prepare to switchover to primary; ## confirm that the prepare has started to happen, you should see "preparing dictionary" sql> select switchover_status from v$database; ## wait a while until the dictionary is built and sent and you should see "preparing switchover" sql> select switchover_status from v$database; |
Check primary database state | 11 |
## you should now see its in the state of "to logical standby" sql> select switchover_status from v$database; |
the last chance to CANCEL the switchover (no going back after this) | 12 |
## On the primary sql> alter database prepare to switchover cancel; ## on the logical sql> alter database prepare to switchover cancel; |
switchover the primary to a logical standby | 13 |
sql> alter database commit to switchover to logical standby; |
switchover the logical standby to a primary | 14 |
## check that its ready to become the primary, you should see "to primary" sql> select switchover_status from v$database ## Complete the switchover sql> alter database commit to standby to primary; |
start the apply process | 15 |
sql> alter database start logical standby apply immediate; |
Action | Step |
Commands |
Check redo applied | 1 |
## This will tell you the lag time |
the failover process (physical standby) | 2 |
## Start by telling the apply process that this standby is going to be the new primary, and to apply all ## At this point the protection mode is lowered ## You can then raise the protection mode (if desired) |
the failover process (logical standby) | 2 |
alter database activate logical standby database finish apply; |
Action | Step |
Commands |
bring back the old primary (physical standby) | 1 |
## Since redo is applied by SCN we need he failover SCN from the new primary |
bring back the old primary (logical standby) | 2 |
## again we need to obtained the SCN ## Lastly you need tell your new logical standby to ask the primary for a new copy of the dictionary and |
Use the Broker to bring back the old Primary |
||
Use the broker to do it all for you | n/a |
DGMGRL> failover to prod1dr; DGMGRL> reinstate database prod1; |
Fast Start Failover (FSFO) |
|
Monitor a specific condition via the Broker | DGMGRL> enable fast_start failover condition "Corrupted Controlfile"; DGMGRL> enable fast_start failover condition "Datafile Offline"; |
Display conditions that are be monitored | DGMGRL> show fast_start failover; |
Select the standby to become the primary | DGMGRL> edit database prod1 set property FastStartFailoverTarget = 'prod1dr'; DGMGRL> edit database prod1dr set property FastStartFailoverTarget = 'prod1'; |
change threshold | DGMGRL> edit configuration set property FastStartFailoverTargetThreshold = 45; |
lag limit | DGMGRL> edit configuration set property FastStartFailoverLagLimit = 60; |
abort primary if in a hung state | DGMGRL>edit configuration set property FastStartFailoverPmyShutdown = true; |
reinstate primary after a failover | DGMGRL>edit configuration set property FastStartFailoverAutoReinstate = true; |
Enable FSFO | DGMGRL> enable fast_start failover; ## Display the configuration DGMGRL> show fast_start failover; |
Other sections of interest
Active Data Guard - see active data guard
Backups and Recovey - see backups and recovery
Troubleshooting - see troubleshooting
My complete setup guide - see complete setup guide