SQL Server 2016 SP1 introduced direct seeding for Always ON Availability Groups. This is a short guide to understand what it is and some warnings when using it.
In previous versions than SQL Server 2016 , in order to configure your “always on” replica, you had to backup your database, then take a couple of transaction logs, then copy these files to the target server, restore with no recovery option, then open up the wizard and configure your availability group. If you have lots of databases, this is a slow process that takes a lot of administrative efforts.
This new feature works like this: you configure an always on group, and every time you add a database to that group, it will immediately show up in the replicas in the form of an empty database, then SQL Server will automatically take care of filling it up in an asynchronous way. This means that manual backups/synchronization isn’t needed.
This feature was introduced by 2016 SP1 and is known to only work through TSQL, no good GUI yet. So make sure to patch your SQL Server 2016 to use direct seeding.
For using this feature you need to configure your “Always ON Group” by TSQL, last release of SSMS 17.1 (Management Studio) now supports GUI, however it is limited and won’t support changes to your existing group (at this stage I strongly recommend to stick with TSQL commands). This feature turns creating secondary replicas into a really simple task. When you create your availability group you just need to add the “SEEDING_MODE” option:
Then you just need to add a database to the group like this:
ALTER AVAILABILITY GROUP AGG01 JOIN
ALTER AVAILABILITY GROUP AGG01 GRANT CREATE ANY DATABASE;
More specifics for implementation in this link
And that’s it, it will immediately start sending info to the target database. If you have 20 databases to configure, instead of setting them up manually one by one, you just run that alter 20 times and off you go. This cuts down administrative efforts, and for people that are not very familiar with “Always ON”, it really makes things simpler.
Main Warning: If you have a big database, it will still have to transfer all the data through the network, remote replicas that are running in a different data centers could take a while or actually never finish to copy over. Compression is not enabled by default, you will need to enable a traceflag 9567 to get the copy to use compression (this will increase CPU specially for very large databases)
Second Warning : A database being “seeded” can’t have its transaction log truncated. So if the seeding for this database takes three hours, then your transaction log will grow large over three hours. If there are not too many transactions you won’t have a problem, but highly concurrent environments need to plan for extra disk space in the tlog drive.
I would recommend to use this on databases less than 500GB or when the network speed is known to be fast. Another approach is to add a database to the “Always ON ” Group in batches… e.g for 20 dbs, add 5…wait for seed to complete….then add the other 5…etc.
You can monitor seeding status with extended events (could impact a little server performance). Below are some of the events to use:
hadr_physical_seeding_backup_state_change, hadr_physical_seeding_restore_state_change, hadr_physical_seeding_forwarder_state_change, hadr_physical_seeding_failure, hadr_physical_seeding_progress, hadr_automatic_seeding_start, hadr_automatic_seeding_success and hadr_automatic_seeding_timeout
Or you can keep querying these DMVS to check status (recommended)
Direct Seeding allows you to decrease administrative effort but it comes with some warnings, so be sure to check them before implementing it. Very large databases could be better with a normal availability group and then synchronize by copying over small backup chunks. You can also resort to have 2 availability groups, one with direct seeding and the other without it, this way you can manage how you want to synchronize your databases.