Data Guard Broker

The broker is part of Oracle's Database Enterprise Manager and an integral part of Data Guard, it is the management framework. You can of course use SQL*Plus to manage Data Guard but the broker when used with the Enterprise Manager becomes a powerful tool offering the following

Before I discuss the integration of the broker into the Enterprise Manager lets first discuss the broker itself. The broker can make changes to a Data Guard setup (physical or logical database), the type of changes are configuration, transport modes, apply setup and role change services and the protection mode. You can also monitor the health of any configuration, it is also responsible for implementing and managing the automatic failover capability also know as Fast-Start Failover (FSFO).

The broker has three parts, a set of background processes on each database, a set of configuration files and a commandline interface (CLI) called DGMGRL.

The broker has a number of processes running, the broker manages these processes automatically you have no control over them

In a broker configuration it is the Data Guard Monitor (DMON) process on the primary database that is the owner of the configuration, all orders will come from the primary even if use a DGMGRL CLI from another server, the standby database will receive all configuration changes via the primary.

When the DMON process communicates with the standby database it uses one of the NSV processes to send the work to the standby, this protects the DMON from a hang if the network should go down. Whenever the DMON needs to execute some SQL it will enlist the aid of the RSM process on the primary database, if the SQL is for the primary database it will execute it directly, however if the SQL is for a standby database the RSM process asks the NSV process to send it to the standby database again this protects the RSM from a network hang.

Each NSV process has a DRC partner process on the standby database, which will perform the work of the NSV process, the DRC process on the standby will return the results or status back to the NSV on the primary.

On startup the DMON will communicate with each standby via the NSV-DRC connection process pair to establish communication and configuration information so that the standby can start the apply services, if there are any problems check the alert.log after the NVS section to see if the are any TNS errors.

When using a RAC environment there are another set of process called the internode servers (INSV) which maintain a connection between the nodes in the cluster to ensure that the broker on each node knows the state of the cluster. A INSV process will always be started on the primary even if the primary is not in a RAC.

Lastly you may see one more process called the Fast-Start-Failover Process (FSFP), which is used only when the primary database is under the control of Data Guard automatic failover feature Fast-Start Failover. We will be discussing FSFP in another section switchover and failover.

Configuration Files

Each database has two copies of the configuration, stored in the below parameter locations

By default they are stored in $ORACLE_HOME/dbs directory with the filename of dr1<DB_UNIQUE_NAME>.dat and dr2<DB_UNIQUE_NAME>.dat, if using a RAC these should be keep on a shared location as only one set of copies can be exist for the entire RAC.

The primary has the master copies of the configuration files, all changes to these files are done by the primary, even if you are on the standby server the configuration changes will be passed to the primary via NSV-DRC processes and the DMON process will make the change which is then propagated back to the standby databases. The reason for the multiple copies of the configuration file is if the primary has a problem then each standby database has a copy, when a failover occurs the standby that becomes the primary will then become the master of those configuration files.

Broker

You have two choices to interact with Data Guard either Enterprise Manager (EM) or the broker CLI DGMGRL you can swap between the two with a few simple configuration changes. To gain full full functionality of Data Guard through Grid Control you must use the broker.

Broker CLI DGMGRL comes with Enterprise Manager, you can run the broker on any platform you wish it does not need to be the same platform as the primary or standby databases.

You can access the DGMGRL command line using the below

DGMGRL CLI

[primargydg01] > dgmgrl sys/oracle

[primargydg01] > dgmgrl sys/oracle@PROD1

When you start the DGMGRL commandline it does not connect you to a database, only the Data Guard configuration. Before you start to use the broker there are a number of things to perform first

Even if you use Enterprise Manager you should really have a understanding on how the CLI works, as mentioned above there are two configuration files, both the systems parameters must be set on the primary and all standby databases, you put set this files anywhere you like

data Guard configuration files location

# Directory
alter system set DG_BROKER_CONFIG_FILE1 = '<directory>/<file>';
alter system set DG_BROKER_CONFIG_FILE2 = '<directory>/<file>';

# ASM
alter system set DG_BROKER_CONFIG_FILE1 = '+DATA/PROD1/Broker/dr1PROD1.dat';
alter system set DG_BROKER_CONFIG_FILE1 = '+FLASH/PROD1/Broker/dr2PROD1.dat';

Note: if you are using ASM then the directories must be created already, use the asmcmd command to do this, the broker will create symlinks to the actual config files but the directories must be there

In a RAC environment they should be in a shared area and all nodes should point to the same files, only one set of files must exist in a RAC.

Once the directory has been created you are ready to startup the broker, do this on all databases (primary and standby)

Start the Broker alter system set DG_BROKER_START=TRUE SCOPE=BOTH;

When you start the broker it does not configure any files yet because you have to use the DGMGRL command, all it does is start the necessary processes we mentioned earlier. If you need to change the location of the files the steps are below

  1. stop the broker using the DG_BROKER_START parameter by setting this to false (on all nodes in the cluster)
  2. change the DG file destination parameters
  3. copy the files to the new location
  4. then re-start the broker, again set the DG_BROKER_START parameter to true (on all nodes in the cluster)

The broker uses the Oracle Net Services to make connections to the databases, setup both redo transport and archive gap resolution and perform role transitions. We need to create a special static entry in the listener.ora file for each database in the broker configuration, this entry makes it possible for the broker to connect to an idle instance using a remote SYSDBA connection and perform the necessary startup. Here is an example

Broker listener.ora static entry

## Primary and all standby databases

SID_LIST_LISTENER =
  (SID_LIST =
     (SID_DESC =
        (GLOBAL_DBNAME = PROD1DR_DGMGRL)
        (ORACLE_HOME = /scratch/OracleHomes/OraHome111)
        (SID_NAME = PROD1DR)
     )
  )

SID_LIST_LISTENER =
  (SID_LIST =
     (SID_DESC =
        (GLOBAL_DBNAME = PROD1_DGMGRL)
        (ORACLE_HOME = /scratch/OracleHomes/OraHome111)
        (SID_NAME = PROD1)
     )
  )

Once the above has been done we are ready to create a broker configuration, there is no difference in a non-RAC or RAC environment, to connect to the configuration you use the dgmgrl command but one word of advise make sure that you use a specific user and password and not the / as the user, this has know to cause problems at a later date

DGMGRL CLI

[primargydg01] > dgmgrl sys/oracle

[primargydg01] > dgmgrl sys/oracle@PROD1

There are four main area's to DGMGRL CLI

If you used the Enterprise Manager/Grid Control to setup your standby database then the next steps would have already been performed for you, if however you used the Power User Method then you need to follow the next steps to configure the broker.

The first step is to create the base configuration

Create base configuration

# Primary Database server

DGMGRL> connect sys/password
DGMGRL> create configuration prod1 as
> primary database is prod1
> connect identifier is prod1;

Configuration "prod1" created with primary database "prod1"

The above command would have created a base configuration, you can display the configuration using the below command

Display configuration # Primary Database server

DGMGRL> connect sys/password
DGMGRL> show configuration

Configuration - prod1

Protection Mode: MaxPerformance
Databases:
prod1 - Primary database
prod1dr - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS

## Here is the same screen shot after adding a logical server

DGMGRL> show configuration;

Configuration - prod1

Protection Mode: MaxPerformance
Databases:
prod1 - Primary database
prod1dr - Physical standby database
prod1lr - Logical standby database

Fast-Start Failover: DISABLE

Configuration Status:
SUCCESS

Next we add the standby database to the configuration, if you receive an error stating that the properties could not be imported from the database, you need to check DB_UNIQUE_NAME also the TNSNAME and the transport parameter (LOG_ARCHIVE_DEST_n), so what is the proper setup, you must have your redo transport parameter defined using the DB_UNIQUE_NAME method, meaning that each redo transport parameter must contain the DB_UNIQUE_NAME=<name> attribute. The broker will search all of your LOG_ARCHIVE_DEST_n parameters looking for a database with a unique name that matches the database name you entered for the command. If you have not done this then you need to use the full set of arguments to add the database

Add the standby database

# Primary Database server - if you have setup db_unique_name, tnsname and log_archive_dest_n

DGMGRL> connect sys/password
DGMGRL> add database prod1dr;

# Primary Database server - the full command set

DGMGRL> connect sys/password
DGMGRL> add database prod1dr

> as connect identifier is prod1dr
> maintained as physical;

Database "prod1dr" added

# if you want to add a logical standby you can use the following

DGMGRL> add database prod1lr
> as connect identifier is prod1lr
> maintained as logical;

Database "prod1lr" added

When you display the configuration you should see that the standby database has been added

Display configuration

# Primary Database server

DGMGRL> connect sys/password
DGMGRL> show configuration

Configuration - prod1

Protection Mode: MaxPerformance
Databases:
prod1 - Primary database
prod1dr - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
DISABLED

## Because we have not setup any parameters for Broker all parameters defaults will be presumed,
## some parameters are set by examining the database others are Broker default values.

DGMGRL> show database verbose prod1

Database - prod1

Role: PRIMARY
Intended State: OFFLINE
Instance(s):
PROD1

Properties:
DGConnectIdentifier = 'prod1'
ObserverConnectIdentifier = ''
LogXptMode = 'ASYNC'
DelayMins = '0'
Binding = 'optional'
MaxFailure = '0'
MaxConnections = '1'
ReopenSecs = '300'
NetTimeout = '30'
RedoCompression = 'DISABLE'
LogShipping = 'ON'
PreferredApplyInstance = ''
ApplyInstanceTimeout = '0'
ApplyParallel = 'AUTO'
StandbyFileManagement = 'AUTO'
ArchiveLagTarget = '0'
LogArchiveMaxProcesses = '4'
LogArchiveMinSucceedDest = '1'
DbFileNameConvert = ''
LogFileNameConvert = ''
FastStartFailoverTarget = ''
StatusReport = '(monitor)'
InconsistentProperties = '(monitor)'
InconsistentLogXptProps = '(monitor)'
SendQEntries = '(monitor)'
LogXptStatus = '(monitor)'
RecvQEntries = '(monitor)'
HostName = 'primarydg01'
SidName = 'PROD1'
StaticConnectIdentifier = '(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=primarydg01)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=PROD1_DGMGRL)(INSTANCE_NAME=PROD1)(SERVER=DEDICATED)))'
StandbyArchiveLocation = 'USE_DB_RECOVERY_FILE_DEST'
AlternateLocation = ''
LogArchiveTrace = '0'
LogArchiveFormat = '%t_%s_%r.dbf'
TopWaitEvents = '(monitor)'

Database Status:
DISABLED

DGMGRL> show database verbose prod1dr

Database - prod1dr

Role: PHYSICAL STANDBY
Intended State: OFFLINE
Transport Lag: (unknown)
Apply Lag: (unknown)
Real Time Query: OFF
Instance(s):
PROD1DR

Properties:
DGConnectIdentifier = 'prod1dr'
ObserverConnectIdentifier = ''
LogXptMode = 'ASYNC'
DelayMins = '0'
Binding = 'OPTIONAL'
MaxFailure = '0'
MaxConnections = '1'
ReopenSecs = '300'
NetTimeout = '30'
RedoCompression = 'DISABLE'
LogShipping = 'ON'
PreferredApplyInstance = ''
ApplyInstanceTimeout = '0'
ApplyParallel = 'AUTO'
StandbyFileManagement = 'AUTO'
ArchiveLagTarget = '0'
LogArchiveMaxProcesses = '4'
LogArchiveMinSucceedDest = '1'
DbFileNameConvert = ''
LogFileNameConvert = ''
FastStartFailoverTarget = ''
StatusReport = '(monitor)'
InconsistentProperties = '(monitor)'
InconsistentLogXptProps = '(monitor)'
SendQEntries = '(monitor)'
LogXptStatus = '(monitor)'
RecvQEntries = '(monitor)'
HostName = 'physicaldg01'
SidName = 'PROD1DR'
StaticConnectIdentifier = '(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=physicaldg01)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=PROD1DR_DGMGRL)(INSTANCE_NAME=PROD1DR)(SERVER=DEDICATED)))'
StandbyArchiveLocation = 'USE_DB_RECOVERY_FILE_DEST'
AlternateLocation = ''
LogArchiveTrace = '0'
LogArchiveFormat = '%t_%s_%r.dbf'
TopWaitEvents = '(monitor)'

Database Status:
DISABLED

One parameter to watch for is the local archiving parameter on the primary and standby databases, the broker will modify this parameter and if necessary add the VALID_FOR attribute in preparation for the archival of your standby redo log files.

One area to watch is if you add an additional standby database, the broker could presume the wrong parameter values, for instance you may want one standby to use SYNC and another to use ASYNC, but if you configure the second standby database using the short method it may default to SYNC, so when you do add any additional standby databases always double check the parameters are set correctly for the type of configuration that you want, any changes should be done before enabling the additional standby database.

Once we are happy that all standby databases have been added and their parameters have been check we are ready to enable the configuration, the command will issue alter system commands on both the primary and standby databases, start the redo transport to the standby databases and the apply services, before examining the configuration give it sometime to complete as it does take a while, otherwise you see some ORA-16610 errors.

Enabling the configuration # Primary Database server

DGMGRL> connect sys/password
DGMGRL> enable configuration
Enabled.

You can tail the alert log file and see what action is taking place, hopefully after a few minutes you should see a successful configuration running, you can also check the redo transport and apply services by viewing the v$managed_standby view.

Display configuration

# Primary Database server

DGMGRL> connect sys/password
DGMGRL> show configuration;

Configuration - prod1

Protection Mode: MaxPerformance
Databases:
prod1 - Primary database
prod1dr - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS

# Standby database - check the LGWR - RFS connection process and the MRP0 is in the applying state

sql> select client_process, process, thread#, sequence#, status from v$managed_standby;

Broker Properties

As I mentioned above the broker has a number of properties which can be changed, to change the property we use the EDIT command in the DGMGRL CLI

Edit configuration EDIT CONFIGURATION SET PROPERTY <name>=<value>
EDIT DATABASE <db_name> SET PROPERTY <name>=<value>
EDIT INSTANCE <in_name> SET PROPERTY <name>=value>
Configuration Properties
BystandersFollowRoleChange  
FastStartFailoverAutoReinstate  
FastStartFailoverLagLimit  
FastStartFailoverPmyShutdown  
FastStartFailoverThreshold  
CommunicationTimeout the amount of time the broker will wait for a response before giving up (default 180 seconds), you can remove any timeout by setting this to 0
Database Properties
FastStartFailoverTarget  
ObserverConnectIdentifier  
ApplyInstanceTimeout defines how long the broker should wait until moving the apply process to another instance in a standby RAC, default is 0 when means immediately
PreferredApplyInstance allows you to tell the broker where you would like the apply to run when you have a RAC, by default it is empty which means it can choose any instance
ArchiveLagTarget alters database parameter: ARCHIVE_LAG_TARGET
DbFileNameConvert alters database parameter: DB_FILE_NAME_CONVERT
LogArchiveMaxProcesses alters database parameter: LOG_ARCHIVE_MAX_PROCESSES
LogArchiveMinSuccessDest alters database parameter: LOG_ARCHIVE_IN_SUCCEED_DEST
LogFileNameConvert alters database parameter: LOG_FILE_NAME_CONVERT
LogShipping
(standby role only)

enables or defers redo transport to that standby database.

alters database parameter: LOG_ARCHIVE_DEST_STATE_n

StandbyFileManagement
(standby role only)
alters database parameter: STANDBY_FILE_MANAGEMENT
Instance Properties
HostName No explaination needed here
SidName No explaination needed here
LogArchiveTrace alters database parameter: LOG_ARCHIVE_TRACE
LogArchiveFormat alters database parameter: LOG_ARCHIVE_FORMAT
StandbyArchiveLocation alters database parameter: LOG_ARCHIVE_DEST_n
AlternateLocation alters database parameter: LOG_ARCHIVE_DEST_n
LsbyMaxSga alters database parameter: MAX_SGA
LsbyMaxServers alters database parameter: MAX_SERVERS

Broker State and Monitoring

There are a number of commands that you can use to change the state of the database

turn off/on the redo transport service for all standby databases
Primary
DGMGRL> edit database prod1 set state=transport-off;

DGMGRL> edit database prod1 set state=transport-on;
turn off/on the apply state
Standby

DGMGRL> edit database prod1dr set state=apply-off;

DGMGRL> edit database prod1dr set state=apply-on;

put a database into a real-time query mode
Standby
DGMGRL> edit database prod1dr set state=apply-off;
sql> alter database open read only;
DGMGRL> edit database prod1dr set state=apply-on;
change the protection mode
Primary

# Choose what level of protection you require
sql> alter database set standby to maximize performance;
sql> alter database set standby to maximize availability;
sql> alter database set standby to maximize protection;

# display the configuration
DGMGRL> show configuration

There are a number of useful monitoring commands and log files that can help with diagnosing problems

configuration DGMGRL> show configuration;
database DGMGRL> show database prod1;
DGMGRL> show database prod1dr;

# There are a number of specific information commands, here are the most used
DGMGRL> show database prod1 statusreport;
DGMGRL> show database prod1 inconsistentProperties;
DGMGRL> show database prod1 inconsistentlogxptProps;
DGMGRL> show database prod1 logxptstatus;
DGMGRL> show database prod1 latestlog;
Logfiles # change the instance name to reflect the one you have choosen

prod1 (alert log): /u01/app/oracle/diag/rdbms/prod1/PROD1/trace/alert_PROD1.log
prod1 (DG log): /u01/app/oracle/diag/rdbms/prod1/PROD1/trace/drcPROD1.log

prod1dr (alert log): /u01/app/oracle/diag/rdbms/prod1dr/PROD1DR/trace/alert_PROD1DR.log
prod1dr (DG log): /u01/app/oracle/diag/rdbms/prod1dr/PROD1DR/trace/drcPROD1DR.log

Lastly there is a utility called the "DG Menu Utility", this can be downloaded from the Oracle Data Guard 11g handbook web site http://www.dataguardbook.com, I will be covering this in my monitoring section but below is a screen shot of the menu

DG Menu Utility DG Menu
========================================
# ------------------------------------------------------------------------- #
# #
# Data Guard Check List - primarydg01
# #
# 0. Review database information and status #
# 1. Check for password file #
# 2. Check for forced logging and unrecoverable activities #
# ------------------------------------------------------------------------- #
# 3. Check for archive log mode #
# 4. Check for standby redo logs #
# 5. Check current SCN on primary and standby databases #
# 6. Check archive log destinations #
# ------------------------------------------------------------------------- #
# 7. Check Data Guard Status View for errors and fatal messages #
# 8. Check Managed Recovery Process Status #
# 9. Check for missing archive logs #
# 10. Check archive log gaps on the standby database #
# 11. Check average apply rate / active apply rate #
# 12. Check transport / apply lag #
# 13. How far behind is my Data Guard in terms of time? #
# #
# ------------------------------------------------------------------------- #
# 20. Launch the Logical Standby Data Guard Submenu #
# ------------------------------------------------------------------------- #
# 21. Generate init.ora entries for primary database #
# 22. Generate init.ora entries for standby database #
# 23. Generate tnsnames.ora entries for primary and standby databases #
# 24. Generate SQL syntax to create standby redo logs #
# #
# ------------------------------------------------------------------------- #
# 30. Generate syntax to duplicate standby database from active database #
# #
# x. Exit #
# ------------------------------------------------------------------------- #
# Enter Task Number:

 

# ------------------------------------------------------------------------- #
# #
# Logical Standby Data Guard Check List - primarydg01
# #
# 1. Check Logical Progress - View Overall Progress Of SQL Apply #
# 2. Check Logical Events - History on Logical Standby Apply Activity #
# 3. Check Logical Events - Detailed View #
# 4. Check Logical Stats - Logical Standby Stats #
# 5. Check Logical Parameters - Logical Standby Parameters #
# 6. Look At What The Logical Standby Processes Are Doing #
# Coordinator, Reader, Builder, Preparer, Analyzer, Applier ... #
# 7. Look At The Status Codes For The Logical Standby Processes #
# 8. Look At Events The Applier Process Is Stuck On #
# ------------------------------------------------------------------------- #
# 10. Check the LCR - Look At Bytes Paged Out #
# 11. Generate Syntax To Skip Transactions #
# Based On MAX(EVENT_TIME) FROM DBA_LOGSTDBY_EVENTS #
# DO NOT SKIP DML STATEMENTS #
# 12. Diagnostic Script Per Metalink Note ID: 241512.1 #
# Look for output in logical_diag_[ORACLE_SID_MONDD_HHMM.out] format #
# ------------------------------------------------------------------------- #
# 20. Review What Is NOT Supported In Your Logical Standby Database #
# 21. Review Tables That Do NOT have Unique Identifiers #
# 22. Check Primary Database For Supplemental Logging #
# #
# ------------------------------------------------------------------------- #
# 30. Start Logical Standby Database #
# 40. Stop Logical Standby Database - PLEASE BE CAREFUL !!!!! #
# THIS WILL STOP THE LOGICAL STANDBY APPLY PROCESS #
# ------------------------------------------------------------------------- #
# #
# x. Exit #
# ------------------------------------------------------------------------- #
# Enter Task Number:

Removing the Broker

Finally if you wish remove the broker follow the steps below

Remove the broker
  1. Connect to the primary database using DGMGRL
  2. run in dgmgrl "remove configuration perserve destinations"
  3. Connect to the primary using SQLPLUS
  4. using SQL set the parameter "dg_broker_start=false"
  5. then run "alter system set dg_broker_start=false"
  6. repeat steps 4 and 5 for all the standby databases
  7. repeat step 2 for all the standby databases
  8. remove the two broker configuration files for all databases