BPSQL: May 2013

There is an incident last week where our storage vendor login to our storage during business hour and change some configurations. You can probably guess what happened afterward. Our main ERP system database server is down. Although it is a virtual server but for some reasons we are not able to boot it up. Good thing is that we have synchronous database mirroring setup so we failover all databases to the mirror server. The failover and validation process took less than 15 minutes which meet our RTO so business is happy and since synchronous mirroring is being used so no committed data loss so audit is happy.

Key lessons you can learn from this are:

If you don't have any sort of high availability solution in place, seriously think about this. Although it is a simple solution, it will potentially save your job. I would recommend trying database mirroring. It is simple to implement.
Once you have decided the solution to be used, implement it and test it. Try to test failover and failback for not less than 20 times I would say. You will know how useful is this practice when the real disaster strikes especially if everybody is looking at your screen waiting for you to failover and bring everything back to normal.
Automate the whole process via any sort of scripting solution if possible. In my case, I have over 50 databases in one instance that need be to be failover, without automation, it will take me forever to do this via GUI. Other than this, it will also avoid potential human error in the process.
Document the procedure. If DBA is not available for whatever reason, with automation, anybody with the correct permission can follow and perform the procedure.

Below is the script that I used to failover all databases. Noted that in this case I am not exactly failover but remove mirroring session as I have no idea how long the principal server will be down and I don't want to take the risk of transaction log blow out.

SET NOCOUNT ON

DECLARE

@DBName varchar(128),

@CMD varchar(8000)

DECLARE db_cursor CURSOR

FOR

SELECT d.name

FROM sys.databases d

INNER JOIN sys.database_mirroring m

ON d.database_id=m.database_id

WHERE d.name LIKE '<databasename, sysname, database>%'

AND d.recovery_model = 1

AND d.state_desc in ('RESTORING','RECOVERING')

OPEN db_cursor

FETCH NEXT FROM db_cursor INTO @DBName

WHILE @@FETCH_STATUS = 0

BEGIN

SET @CMD = 'USE [master]; ' +

'ALTER DATABASE [' + @DBName + '] SET PARTNER OFF;'

PRINT @CMD

EXEC(@CMD)

FETCH NEXT FROM db_cursor INTO @DBName

END

CLOSE db_cursor

DEALLOCATE db_cursor

After removing the mirroring session, we have to recover the databases as they are still in no recovery state. Below is what I used.

DECLARE

@DBName varchar(128),

@CMD varchar(8000)

DECLARE db_cursor CURSOR

FOR

SELECT d.name

FROM sys.databases d

INNER JOIN sys.database_mirroring m

ON d.database_id=m.database_id

WHERE d.name LIKE '<databasename, sysname, database>%'

AND d.recovery_model = 1

AND d.state_desc in ('RESTORING','RECOVERING')

OPEN db_cursor

FETCH NEXT FROM db_cursor INTO @DBName

WHILE @@FETCH_STATUS = 0

BEGIN

SET @CMD = 'USE [master]; ' +

'RESTORE DATABASE [' + @DBName + '] WITH RECOVERY;'

PRINT @CMD

EXEC(@CMD)

FETCH NEXT FROM db_cursor INTO @DBName

END

CLOSE db_cursor

DEALLOCATE db_cursor

After this, depends on your environment, the databases are quite operational and ready to go.

Connection of application to the databases is separated story though. I try to have all applications to use cnames to connect to the database server rather than the physical server name. In the event of failover, we just have to update the DNS to point to new server. It eliminate the hassle to update connection string especially if there are heaps of them.

I hope above reasons and example is good enough to convince you to implement some sort of high availability solution on your most critical system if you don't have one. Even you have one, it doesn't means it will always work and you know what to do, so please document and practice, practice, and practice or else please kindly update your resume.

BPSQL

Thursday, May 2, 2013

When Disaster Strikes