We managed to recover most of the data on the RAID5 Disks, especially the Application. The database files were physically recovered, but I considered rebuilding the Database on the Server for the following reasons:
- The Server O/S was upgraded
- Oracle 9iR2 Software had to be reinstalled
- Data had changed in the last 3 days, as people were using the Secondary Setup. So, I needed to clone the production as the Secondary Setup.
- Using export dump of the Secondary Instance, create the Production Oracle Database.
- Using RMAN, clone the Production Oracle Database with the "Duplicate Target Database" command.
- Or, simply clone the Production Oracle Database using the Cold Backup of the Secondary Database.
The Application Testing was carried out to check the Application were running smoothly and for the Data Validation. Once the testing was successful, we registered the database in the Recovery Catalog, and took a full database backup. We had to ensure that the Oracle Services were owned by the Domain Administrator Account and not the Local Account for registering the database and taking a Full Database RMAN Backup ensuring that the Controlfile Autobackup and SPFILE backup was on the shared location along with the RMAN Backups.
[Note: 145843.1 How to Configure RMAN to Write to Shared Drives on Windows NT/2000]
Once the backup was complete, the Temporary Setup was shutdown and we brought the Production Server online for all the users.
In the next 2 weeks, the following necessary arrangements were made for the short-comings seen in the Disaster Situation:
- Application Backup is daily ensured to Tapes
- Application files, as of 13th May 2009, have been backed up to DVDs. Every 15 days, Application files backup to DVDs is being ensured.
- A Temporary Server was arranged by IT Administrators and has been cloned (using RMAN) same as that of the Production Server. Scheduled Jobs run at 3 intervals so as to Clone the Secondary database is put in place. The cloning process takes more than 2 hours to complete. In case of an unforeseen disaster, we can easily switch to the Secondary Server with a minimal Data Loss.
- Source Code Backup and its relevant Document Control is strictly ensured prior to moving to Production.
- After the Production Server’s Operating System Upgrade to Windows Server 2003, the RMAN backup location has been changed to SAN Storage location, which is further backed up to Tapes by the IT Administrators. The earlier RMAN backup performance issue of 13 hours has been resolved. Now, the backup completes in less than an hour. You can read about it here.
- For the next 3 months, we will be carrying out planned monthly recovery of complete Application from the Tapes and/or the DVDs. Once the recovery simulation comfort level is attained, we can carry out the simulation every Quarter.
- Finally, the new Server Procurement Process has been started, and which server to purchase has been finalized.
- 4 am: Clone after the Full Database RMAN Backups
- 11 am: Clone in the mid of the Working Day (after all Archivelog Backups are available up to 11 am)
- 4 pm: Clone at the end of the Working Day (after all Archivelog Backups are available up to 4 pm)