Oracle Database Recovery

Database recovery is a very complex topic, by practicing recovery techinques regularly will help when the real thing happens everything should fall into place.

Oracle performs automatic recovery when an instance has crashed or been aborted (using shutdown abort). Recovery involves a two step process

roll-forward step - the database applies the committed and uncommitted data in the current online redo log files to the current online datafiles
roll-back step - the database removes the uncommitted transactions applied in the previous step using the undo data in the undo segments

The recovery begins at a point in the redo logs know as thread checkpoint redo byte address. This is the time when the last checkpoint was done before the crash. You can help in reducing the time recovery happens by decreasing the time the checkpoint occurs thus reducing the amount of data that has to be recovered. By setting the parameter fast_start_mttr_target (maximum is 3600 seconds - 1 hour) you can specify how long a recovery should take, beware that this is only a target and oracle will try its best to meet this target, also decreasing this threshold will probably impact the performance of the server. There are two other system parameters that can be used to control when checkpointing occurs log_checkpoint_timeout and fast_start_io_target.

set MTTR	alter database set fast_start_mttr_target = 600; Note: the value is in seconds
Display current MTTR	select recovery_estimated_ios, estimated_mttr, target_mttr from v$instance_recovery;
Other useful system parameters
LOG_CHECKPOINT_TIMEOUT	log buffers are written to disk within this target can have big impact on performance if to low (default is zero and maximum is 3600 secs – 1 hour) alter system set fast_start_mttr_target=60 scope=both; (no restart required)
FAST_START_IO_TARGET	maximum number of seconds that any new or modified block in the buffer cache waits until it is written to disk (overrides fast_start_mttr_target).

There are two types of recovery possible complete and incomplete recovery

complete - means a recovery with no loss of data, all changes in the redo and archived redo logs are reapplied.
incomplete - implies data loss, because you restore only part of the data, this could be due to data corruption and you wish only to restore to a point in time

Block Media Recovery

If only a few blocks within a datafile are corrupted, you should consider block media recovery, this can be performed via RMAN.

Database Recovery

I have broken down recovery techniques into different key area's

Complete recovery
Recovering non-critical files (temporary tablespaces, redo log files index tablespaces read-only tablespaces and password file)
Recovering critical files (control file)
Incomplete database recovery
Recovering from user errors
Flashback Database
Flashback (flashback query, flashback table query, flashback version query and flashback transaction query)

There is a table that can be used to identify files needing media recovery

V$RECOVERY_FILE

reports missing or corrupted files, the table will identify the file and error.

Complete Database Recovery

Recovering a database can be done in three ways RMAN, traditional user-managed recovery (Netbackup, ufsrestore) or Enterprise Manager (OEM).

RMAN complete DB	## Double check that backup sets exists and we can recover rman> crosscheck backupset 28; rman> sql "shutdown immediate"; rman> sql "startup mount"; rman> restore database; rman> recover database; rman> sql "alter database open"; Note: you must be in mount mode, the redo logs will roll forward the database during recovery
RMAN tablespace recovery	## Double check that backup exists and we can recover rman> crosscheck backupset 28; ## Now take the tablespace offline and recover the file rman> sql "alter tablespace data01 offline"; rman> restore tablespace data01; rman> recover tablespace data01; rman> sql "alter tablespace online"; ## You need to take the tablespace offline when recovering a whole tablespace
RMAN datafile recovery	## Double check that backup exists and we can recover rman> crosscheck backupset 28; ## Restore the datafile then recover it, ## once completed the database can be opened rman> restore datafile 12; rman> recover datafile 12; rman> sql "alter database datafile 'c:\oracle\data\data03.dbf online"; ## It is possible to not take the tablespace offline as long as no one uses it, the dba_data_files (online_status) states that the file needs recovering
User-Managed	1. Open in mount mode 2. restore the data files (Netbackup or ufsrestore) 3. recover the database sql> recover database; 4. Open the database sql> alter database open;
OEM	see oracle manual

Recovering non-critical files

see recovering non-critical files for more details

Recovering critical files

see recovering critical files for more details

Incomplete Database Recovery

see incomplete database recovery for more details

Recovering from User Errors

see recovering from user errors for more details

Flashback

see flashback for more details