When loading data to SQL Server we often bump into performance issues. There are several tools and techniques for doing so, but which is the fastest of them all?
The trick to this is accomplishing a minimally logged operation. This is a transaction that writes into the transaction log minimally, avoiding the IO overhead which translates into a faster execution.
Normal transactions follows the write ahead logging algorithm, which is basically the life cycle of a full transaction, the transaction first logs the sequence of operations into the transaction log and and then finally to the data file. Below a sample picture of the process:
And so to accomplish a good performance, a transaction needs to be minimally logged. There are certain restrictions for this to happen:
-The database recovery model needs to be bulk-logged or Simple mode
-The transaction needs to have a tablock hint
-If the table doesn´t have indexes the operation will be minimally logged
-If the table doesn´t have clustered index and have non-clustered indexes the data pages will be minimally logged, however index pages will be minimally logged only if the table is empty.
-If the table have a clustered index and is empty both data and index pages will be minimally logged. On the other hand if the table is full the operation will be Fully logged.
In other words, having a full table with clustered index is almost guaranteeing that you won´t be able to perform a minimally logged operation. But there is a workaround for this and is called traceflag 610.
TraceFlag 610
The traceflag 610 can be enabled for a single session or globally and it is intended to enable minimally logged operations for b-tree indexes.
However there is a catch, because of the normal usage of the database it will grow and then have update/delete operations that will generate empty pages. Any operation that will be storing information into these internal empty pages is going to execute a full operation, only the data loads that are going to generate new extends are going to have the traceflag 610 effect and have the chance of been minimally logged.
As you can see already there are a lot of variables that comes into play when trying to execute a minimally logged operation, for ease of use below a table that can also be found in msdn links for reference
Now that we know the pre-requisites for having a minimally logged and hence performance optimized data load let´s check the options we have for doing this task.
SSIS packages
You can have a data transformation task to transfer information from one source to a destination, however you need to take a closer look on how you do this. Follow this list to have a better performance
-Avoid using ODBC source and destination tasks, using ODBC is up to 15 times slower than OLE-DB sources
-Use Bulk Insert Task for loading data from flat files, however do note that loading from a SQL Server Source can be faster.
-Go for SSIS when your data load requires transformation or a more complex process, this means there is a reason for using an SSIS package, if the data load is straight flow of data from one table to the other, there can be other tools that will be faster and easier to use.
-Try to use the tablock option when available in your SSIS component
BCP
You can use BCP to import/export data, the default command ignores the batch and tablock capabilities, you can enable a data load to use a tablock and have batches.
When using BCP avoid using small batches, general rule of thumb is to use the largest batch possible.
Bulk Insert
This option is very fast as well, you can also use the tablock and batch parameters, this option is the fastest when loading data from a file.
But which tool is faster?
This table explains which options are better and when
Conclusion
Use Traceflag 610 to accomplish minimally logged operations, however do test this first as in my tests sometimes was slower when enabled.
Prefer to use SSIS or bulk insert tools as these are faster and provide with more features for transformation and data manipulation
Take into account disk space before starting a data load
Take into account the recovery model of a database
Take into account the CPU available to perform this task and the possibility of running simultaneous processes.
Remember to check if there are transaction log backups running, as these could slowdown your data load.
No comments