SQL Server 2016 SP1 introduced direct seeding for Always ON Availability Groups. This is a short guide to understand what it is and some warnings when using it.
In previous versions than SQL Server 2016 , in order to configure your “always on” replica, you had to backup your database, then take a couple of transaction logs, then copy these files to the target server, restore with no recovery option, then open up the wizard and configure your availability group. If you have lots of databases, this is a slow process that takes a lot of administrative efforts.
This new feature works like this: you configure an always on group, and every time you add a database to that group, it will immediately show up in the replicas in the form of an empty database, then SQL Server will automatically take care of filling it up in an asynchronous way. This means that manual backups/synchronization isn’t needed.
This feature was introduced by 2016 SP1 and is known to only work through TSQL, no good GUI yet. So make sure to patch your SQL Server 2016 to use direct seeding.
For using this feature you need to configure your “Always ON Group” by TSQL, last release of SSMS 17.1 (Management Studio) now supports GUI, however it is limited and won’t support changes to your existing group (at this stage I strongly recommend to stick with TSQL commands). This feature turns creating secondary replicas into a really simple task. When you create your availability group you just need to add the “SEEDING_MODE” option:
Then you just need to add a database to the group like this:
ALTER AVAILABILITY GROUP AGG01 JOIN
ALTER AVAILABILITY GROUP AGG01 GRANT CREATE ANY DATABASE;
More specifics for implementation in this link
And that’s it, it will immediately start sending info to the target database. If you have 20 databases to configure, instead of setting them up manually one by one, you just run that alter 20 times and off you go. This cuts down administrative efforts, and for people that are not very familiar with “Always ON”, it really makes things simpler.
Main Warning: If you have a big database, it will still have to transfer all the data through the network, remote replicas that are running in a different data centers could take a while or actually never finish to copy over. Compression is not enabled by default, you will need to enable a traceflag 9567 to get the copy to use compression (this will increase CPU specially for very large databases)
Second Warning : A database being “seeded” can’t have its transaction log truncated. So if the seeding for this database takes three hours, then your transaction log will grow large over three hours. If there are not too many transactions you won’t have a problem, but highly concurrent environments need to plan for extra disk space in the tlog drive.
I would recommend to use this on databases less than 500GB or when the network speed is known to be fast. Another approach is to add a database to the “Always ON ” Group in batches… e.g for 20 dbs, add 5…wait for seed to complete….then add the other 5…etc.
You can monitor seeding status with extended events (could impact a little server performance). Below are some of the events to use:
hadr_physical_seeding_backup_state_change, hadr_physical_seeding_restore_state_change, hadr_physical_seeding_forwarder_state_change, hadr_physical_seeding_failure, hadr_physical_seeding_progress, hadr_automatic_seeding_start, hadr_automatic_seeding_success and hadr_automatic_seeding_timeout
Or you can keep querying these DMVS to check status (recommended)
Direct Seeding allows you to decrease administrative effort but it comes with some warnings, so be sure to check them before implementing it. Very large databases could be better with a normal availability group and then synchronize by copying over small backup chunks. You can also resort to have 2 availability groups, one with direct seeding and the other without it, this way you can manage how you want to synchronize your databases.
I’ve been working with SQL 2017 and automatic seeding and I’ve found it extremely inconsistent in its application, even when using newly created databases for testing. I’ve tried using the script the GUI generated in SQLCMD mode and just using the wizard. Many times I run though the wizard and/or script and sometimes it works, sometimes it doesn’t…the database never gets created on the secondary, however other times I can remove the db from AG and just re-run the script and it works. I cannot find a rhyme or reason for the inconsistency.
However using the backup and restore method works 100% of the time, which is ok for small databases, but sucks when you have a 500GB database and your backups are not just .bak files used by native sql backups.
how do you troubleshoot the database not being created on the secondary node. Nothing I’ve found in any logs or extended logging seems to clearly indicate why a database wasn’t created
The GUI is unreliable with earlier versions, I would recommend upgrading your management studio to the latest version for bug fixing. But even then my recommendation would be using this feature via T-SQL only. This is a known issue. There are also known issues related to the product, patching SQL Server to the latest SP or CU is recommended. For example see this issue described as “random” in the following link
About troubleshooting I use a mix of SQL Server error log, and querying these 2 tables sys.dm_hadr_automatic_seeding and sys.dm_hadr_physical_seeding_stats. The feature doesn’t have a dashboard yet with specific user/friendly error messages.
Usual error is having “left over” databases in the secondary node. Also problems with network load, can help enabling compression for automatic seeding as well.