Thursday, September 27, 2012

Replication Errors–Investigation and Fix

Replication in SQL Server is one of the features I like the most. I use Replication for many applications including Report Server, Data Warehouse and data processing.
It is very common to see that when Replication fails, generally we Re-Initialize entire replication. There are certain scenarios when Re-Initializing the replication is not favorable, for example when database/tables being replicated are in TBs and takes hours to create new snapshots or when subscribers have snapshot less subscription.
In such scenarios, DBAs have limited time to investigate the problem and to reach to the root cause and fix the root cause before subscription expires. I have also faced these issues many times myself, however, 99% times I have been able to fix the issues before subscription expires and even before anyone else notices. Here is the procedure with example:
  • For this test I have created one table called “Test_repl” in Test database, here is the script:
CREATE TABLE [dbo].[test_repl](
    [ID] [int] IDENTITY(1,1) NOT FOR REPLICATION NOT NULL,
    [Name] [varchar](50) NOT NULL,
CONSTRAINT [PK_test_repl] PRIMARY KEY CLUSTERED
(
    [ID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]
  • Above table will be replicated to another database named “Test_Subscriber”
  • Added a row in publisher table “Test.dbo.Test_repl”, since this is the first row in table, the ID column will get value ‘1’ from auto increment.
insert into test_repl (Name) select 'XYZ'
  • Make sure that the row has been replicated from publisher to subscriber.
  • Make sure that there is no error in replication monitor for this publisher.
image
  • Now delete the row at subscriber.
delete from Test_Subscriber.dbo.test_repl
where ID = 1
  • Now modify the row from publisher and update XYZ to WXYZ. It should break the replication with critical error.
  • update test.dbo.test_repl set Name = 'WXYZ' where ID = 1
  • Replication Monitor now shows error in distribution agent:
image
  • If we assume that there are thousands of rows but don’t know which row caused the problem following steps will guide us how to find that row.
  • There are few requirements, like we should know the Database name being published and it’s publisher id. To know this run the following query. In the following query we passed database name in the where clause. Remember the ID from below query. Following query will run against Distribution database.
select id from MSpublisher_databases where publisher_db ='Test' -- for database id
  • Run the following query to get the Top 100 errors in replication, it should display all recent errors. We may also run the query with time stamp filter. Following query will run against Distribution database.
select top 100 * from msrepl_errors
where command_id > 0
order by time desc
  • From above query, copy the XACT_sqlno and command_id.
image
  • So far we have collected publisher_id, Xact_sqlno and command_id. These values will be passed against the following SP. Following query will run against Distribution database.
declare @seq nchar(22)
set @seq = '0x0000001A00000113000400000000'
exec sp_browsereplcmds @xact_seqno_start = @Seq -- this xact_seqno
, @xact_seqno_end = @Seq -- -- this xact_seqno
, @command_id = 1 -- this the command_id got from queries ran earlier
, @publisher_database_id = 1
-- Here put the publisher_id
  • Above query will return lot of information, however, the information we are looking for is command (text). Here is the list.
image
  • Try to interpret the text in command column, basically it is running sp_MSupd_dbotest_repl procedure which is trying to update primary key 1 to ‘WXYZ’ on subscriber database. So now we know that data in the table belonging to primary key 1 is missing. Manually insert this row in subscriber, start the distributor agent again, it should fix the issue. in case we are unable to interpret the SP, then we may open the SP text from subscriber database. Here is the query to insert the row in subscriber.
use Test_Subscriber
set identity_insert  Test_Subscriber.dbo.test_repl on
insert into Test_Subscriber.dbo.test_repl (ID, Name)
select ID, Name from test.dbo.test_repl where ID = 1
set identity_insert  Test_Subscriber.dbo.test_repl off
Hope this will be useful information. if you like this post, I appreciate you leaving your comments.