Around 4 months back we faced a latency issue in replication. We used tracer tokens and found that the distribution agent was lagging behind .Before I go forward, let me explain you that the distribution agent has 2 threads .Reader thread reads the value from the MSrepl_transactions table (this activity happens in parallel with the log reader agent where the rows are pumped in to the msrepl_transactions table) from distributor database and the writer thread apply those commands to the subscriber .
To find out where we are getting delayed we configured the verbose logging with level 2 in the distribution agent job (http://support.microsoft.com/kb/312292). In the output we saw that there is a good time gap after the sp_MSget_repl_commands command is fired and the next command .Now this command is fired to read the distribution database and populates the rows in the memory tables .These rows are then read by the writer thread and inserted into the subscriber database.
In the verbose log you can also check the statistics which is fired every 5 mins (starting SQL Server 2005 ):
•Cumulative Update 12 for SQL Server 2005 Service Pack 2
•Cumulative Update 5 for SQL Server 2008
•Cumulative Update 2 for SQL Server 2008 Service Pack 1
you will see data between these lines
*************STATISTICS SINCE AGENT STARTED *******************
[data]
***************************************************************************
It was clear that the reader thread was taking a lot of time in retrieving the rows as compared to the writer thread writing those commands to the subscriber database .
The next stage is to check the msrepl_transactions and msrepl_commands tables .We checked the number of rows in those tables and found that there a millions of rows which are replicated but still showing up in the in those tables . This was strange . We checked the cleanup job and found that the job was running fine .
So whats the issue ? Why the job is not removing the replicated rows ??
upon digging deep , we found that the DBA selected the "Create a snapshot immediately and keep the snapshot available to initialize subscriptions" option when he configured the replication .As a result Replication is suppose to keep all the transactions cached in the Distribution database for the entire Retention Period.You will also see all the snapshot files in the snapshot folder on the distributor .
Why it happens :
Every new subscriber that is added with in the subscription retention period needs the initial snapshot and then the data from the logreader is applied over the snapshot that is accumulated in the distribution database. But the Snapshot gets OLD too as the database image changes from time to time .Therefore every time a new snapshot is needed first. Because of this option set, the same old snapshot is applied first and then all the remaining LogReader entries from the distribution database .That is the reason all the old transactions are kept till the subscription expires.
Why is it configured ?
This command is configured in the environment where there is a need to create the subscription quite often and also if the snapshot increases in size quite considerab;y over the time .
How is it configured ?
Commandline :
via sp_addpublication <--Check BOL
GUI :
Drawback :
Due to this option the size of Msrepl_transactions and Msrepl_commands increases ( which is more than normal) which slows up the synchronizations and clogs up the system.
How to disable it ?
EXEC sp_changepublication
@publication = 'your_publication_name',
@property = 'immediate_sync',
@value = 'false'
GO
Happy Learning !!!!!!!
No comments:
Post a Comment