In my previous post I talked about how create and use a policy against your SQL Server Instance (don’t forget to check our part 1 in this series too). Now we will talk about the Data Collector and how configure it on your environment.
What is the Data Collector?
The Data Collector (DC) is one of the main components in the set of tools for data collection provided by SQL Server. With the DC we can define a centralized point for storage of all collected metrics through the instances of the SQL Server database in your infrastructure, and these metrics can be from multiple sources and not just related to performance metrics.
To increase the efficiency of the metrics collected, you must adjust the DC according to each existing infrastructure environment (development, approval, production). The DC stores all the collected information in a Management Data Warehouse (MDW) and allows you to set different retention periods for each metric that will be collected.
As the DC has a programming interface (API), we can customize collections for any other type of desired metric. However, in this article, we will focus only on the three collections of the DC system: Disk Usage, Query Activity and Server Activity. Figure 28 shows how the DC fits in the strategy for collecting and managing data in a SQL Server database.
The Data Collector Architecture
Before starting the implementation of the DC, it is necessary to understand which components are part of this feature. They are:
- Target: An instance of the SQL Server database that supports the process of collecting metrics by using the DC.
- Target Type: Defines the type of target which will collect metrics. For example, an instance of SQL Server database has different metrics than the metrics collected from a SQL Server database itself.
- Data provider: A data source that will provide metrics for the collector type.
- Collector Type: A delimiter for the packages in the SQL Server Integration Service (SSIS), which provides the mechanism for the collection and storage of the metrics in the MDW.
- Collection Item: Is a collection item in which are defined which the metrics will be collected, how often this gathering will be held and what is the retention time of the metric stored.
- Collector Set: A set of Collection Items.
- Collection Mode: The way that the metrics will be collected and stored in the MDW. The metrics can be collected on an ongoing basis (Cached Mode) or through a scheduling sporadic (Non-Cached Mode).
- Management Data Warehouse (MDW): The relational database used to store all the collected metrics.
Note: In SQL Server 2014 we have the following collector types: Generic T-SQL Query, Generic SQL Trace, Performance Counters and Query Activity.
The Figure 29 shows the dependencies and relationships between the components of the DC.
The data provider is an external component in DC architecture and which, by definition, has an implicit relationship with the target. A data provider is specific to a particular target and provides metrics through views, performance counters and components of Windows Management Instrumentation (WMI) are consumed by the DC.
We can visualize from Figure 29 that a collector type is associated to a particular target, and that this relationship also defines how the metrics will be collected and what the storage schema of these metrics, for the collector type also provides the location of MDW, which can be on the server that is running the collection or on a centralized server.
A collection item has a default and collection frequency that can only be created within a collector set. The collector set, in turn, is created on the instance of SQL Server that will be monitored through the DC and consists of one or more collection items. The collection of the set of metrics defined in the collector set is accomplished through Jobs executed by the SQL Server Agent service, and the metrics collected are stored in the MDW periodically through predefined schedules.
The Figure 30 shows a collector set called system Disk Usage, in which we visualize that the configuration was performed with the collection mode set to Non-Cached, using two collection items of type Generic T-SQL Query Collector Type, and that the metrics are collected every 60 seconds, with retention of these metrics in the MDW for 730 days.
It is important to note that the DC is fully integrated with the SQL Server Agent service and using Integration Services, using both intensively. After the DC configuration, the process of collecting and recording of metrics is accomplished by a set of SQL Server Agent Jobs created and started automatically.
Management Data Warehouse (MDW)
So we can use the metric collection through the DC, though you must first perform the configuration of the MDW that will be the relational database responsible for storing all the metrics collected by the collector sets.
For this we can use an existing relational database and configure it as a MDW. However, it is recommended that you set a new database, because during the configuration process of the MDW several schemas and tables relating to DC will be created. The schemas are generated automatically after the configuration of DC are the core and the snapshot. A third schema, custom_snapshots, name will be created when a collector set as customized is set by the administrator of the Bank.
The main schema of the MDW is the core, because it has the tables, stored procedures, and views that are available to all collector types that will also be used to organize and identify the metrics collected. To ensure the integrity and security of MDW, all database objects belonging to the core schema can only be changed by members of the Profiles database db_owner and mdw_admin.
The Table 2 lists all the existing tables in the core schema and their respective descriptions.
Table 2. Core schema tables.
The schema snapshot, in turn, owns the objects required for the storage of collected metrics through the system collection sets. The tables in this schema can only be changed by members belonging to the database profile mdw_admin.
The Table 3 illustrates which tables are used by collection sets of Server Activity system and Query Statistics, created after the setting of DC.
Already the custom_snapshot schema has the tables and views that were created when a custom collection set has been configured. Any custom collection set that you need a new table for to store collected metrics can create tables in this schema. The tables can be added by any member of the mdw_writer database.
Configuring the Data Collector
To exemplify the metric using the DC collection, we have the instance VITADB\SQLCMS, responsible for hosting the MDW database, and instances VITADB\SQLINSTANCE1 and VITADB\SQLINSTANCE2, which will have your metrics collected through the collector sets. That said, the first step to setting the DC is the creation of the MDW in VITADB\SQLCMSinstance, as the following steps:
1) Through the Object Explorer, select the folder Management.
2) Right-click on Data Collection -> Task -> Configure Management data warehouse.
3) In the dialog box that appears (see Figure 31), select the VITADB\SQLCMS instance and create the MDW database via the button New.
4) Select which logins have access to the MDW database ( Figure 32), and then click Finish.
Note: The members of the mdw_admin database have permission of SELECT, INSERT, UPDATE, and DELETE, in addition to being able to change any .MDW schema and perform the Jobs of maintaining DC. The members of the mdw_writer database have permission to upload the collected metrics to the MDW. Already members of the database profile mdw_reader have only SELECT permission on MDW.
After the creation and configuration of the MDW in VITADB\SQLCMS instance, you must start the process of collecting metrics on instances VITADB\SQLINSTANCE1 and VITADB\SQLINSTANCE2 by setting the collector sets of system in each of the instances and directing the metrics collected for the MDW database.
For the configuration of the collector sets of a system, we have the following steps:
1) Through the Object Explorer, select the folder Management.
2) Right-click on Data Collection -> Tasks -> Configure Data Collection.
3) In the dialog box that appears (Figure 33), connect on VITADB\SQLCMS instance, select the MDW database, the collector set that you want, and then click Finish. For this example we will use the collector set of system System Data Collection Sets, which will be automatically created after Setup.
Complete the configuration of the collection, we’re creating three collector sets of system: Disk Usage, Query Statistics and Server Activity. The collector set Disk Usage collects metrics about the growth of data files (.mdf and .NDF) and log files (.ldf) user databases and existing systems in instance monitored by DC. With this information it is possible to know what the daily growth trend is, in MB, of the files examined.
Table 4 shows the properties of the collector set of system Disk Usage.
Table 4. Collector set properties of system Disk Usage.
In turn, the collector set of Server Activity system collects server activity metrics, statistics, performance, blocking chains, General information of memory, CPU, and network.
The Table 5 shows the properties of the collector set of Server Activity system.
Table 5. Collector set properties of Server Activity system.
Finally, the collector set of Query Statistics collection system metrics for queries executed against the database monitored by the ad, as statistics, execution plans, most costly queries in relation to your use of disk, CPU, memory and queries that took more time to be finalized.
Table 6 shows the properties of the collector set of Query Statistics system.
Table 6. Properties of the collector set of Query Statistics System.
You will see the metrics collected:
The metrics collected by DC can be accessed directly by T-SQL queries. However, after setting the collector sets of system, some standardized reports become available for viewing. To access them it is necessary to right-click on Data Collection -> Reports -> Management Data Warehouse. After the configuration of the collector sets, three reports are available. We will look at each of them in the following subtopics.
Server Activity History
All available information in this report is related to the use of resources of the database server, such as CPU or memory allocated in total, of which the biggest wait types that exist in SQL Server, the value of IOPs, among others. All this information is extremely useful for troubleshooting and tuning.
The Figure 34 shows the top of the Server Activity Historyreport, extracted from the VITADB\SQLINSTANCE2 instance.
Figure 34. The top of the report Server Activity History.
At the top of the report we visualize which SQL Server instance metrics are displayed and at what date and time it was requested. Below this information you can select what time period, in which the metrics were collected, must be loaded in the report. In each of the graphics presented, there is information about the operating system (green lines) and on the SQL Server (blue lines).
Figure 35 shows Server Activity History, extracted from the VITADB\SQLINSTANCE2 instance at the bottom of the report.
These reports are also extremely useful in the process of performance analysis and troubleshooting, because they display the biggest wait types and what main types of activities that occur in the instance. From any of these reports we can visualize more details by selecting one of the lines or data bars and performing a drill-down on the desired information.
Note: Immediately after the setting of DC there will be information to be loaded in the reports, and the more metrics collected and stored in the MDW, the greater the detail achieved through the reports.
Disk Usage Summary
This report lists the size of the databases that are monitored by the ad and what the average growth is over a period of time. The metrics displayed by the report are separated by data files and the log files of monitored databases. As shown in Figure 36, each of the data files and log files has the initial size information, the current size and the average growth per day in MB.
Query Statistics History
The most common reason for performance issues found in SQL Server is writing T-SQL commands inefficiently. Therefore, the collection of performance metrics of these consultations is an essential part for the tuning process. By default, you can view the 10 queries that consume more CPU, but you can change this filter and view the queries that carried out more IO operations, how long they were running, held more physical reads or carried out more logical writings.
Figure 37 shows the report Query Statistics History, extracted from the VITADB\SQLINSTANCE2 instance.
Recommendations for Configuration of DC
To ensure that there is minimal impact during the environmental monitoring process of a SQL Server database by DC, adopt the following recommendations:
- Use a centralized server to the MDW, it allows that there is only one location for execution and visualization of reports.
- All database SQL Servers that will be monitored by the DC should be part of the same domain.
- When creating a custom collector set using the collector type Generic SQL Trace, define a set of filters so that only the really necessary metrics are collected, because in this way the MDW doesn’t store unnecessary information.
- Before you create a custom collector set using performance counters, you can be sure the collector set of Server Activity system is no longer collecting this metric.
- If any collections of metrics across multiple T-SQL queries are carried out with the same frequency, combine them in a single collector set. Doing this we will reduce the amount of memory used by the DC executable (DCCEXEC.exe) while gathering metrics. Similarly, combine multiple collection items of type Performance Counters in a single collection item whenever possible.
- Combine multiple collection items in a single collector set whenever possible. The only reason to create collector sets apart is if there are different retention periods or a different collection schedule.
- A collector set using the collection mode set to Cached should always keep a running collection process. If the metrics are collected often, this is more efficient than starting and stopping the collection process where new metrics should be collected. In contrast, the collection mode set to Non–Cached doesn’t have a running collection process most of the time, that is, a new collection process will be started according to the predefined schedule and so will be stopped again, avoiding the excessive use of server hardware resources. So, if the metric collection occurs rarely, the collection mode set to Non–Cached is more efficient than leaving the collection process on hold most of the time. As a general rule, if the metric needs to be collected every five minutes or more often than that, consider configuring a collector set using the collection mode: Cached. If the collection of metrics can be performed with a frequency greater than five minutes, it is recommended to configure a collector set using the collection mode: Non-Cached.
- The higher the frequency, the greater the overhead on the database server. In this way, choose to always configure the lowest frequency possible that meets the need of collecting.
As described earlier, beginning with SQL Server 2008 we have two tools that facilitate the management of a SQL Server database consisting of multiple instances. They are: the Policy-Based Management and the Data Collector.
Using the PBM you can create policies that assess certain conditions in existing objects on the instance of a SQL Server database. These policies can be designed manually or imported through XML files available after the installation of the database engine. Policies can be evaluated manually (OnDemand), following a predefined schedule (OnSchedule) or at the time a particular property of a SQL Server object is changed (OnChange) and can also be evaluated on multiple SQL Server instances at once, through the Central Management Server functionality.
With the DC we have a feature that collects metrics for all SQL Server instances and stores them in a centralized database, called the Management Data Warehouse. Through the configuration of DC and the Management Data Warehouse, three collector sets of the system are created. They collect metrics concerning the utilization of server hardware resources (CPU, memory, disk, and network), growth of data and log files of monitored databases in addition to the more costly T-SQL queries executed on the database server, which is accomplished through the collection of Jobs defined on a SQL Server Agent.
It is worth noting that the DC also provides a wide range of reports, so that the metrics collected by the collector sets can be evaluated during troubleshooting and tuning processes. Finally, note that the DC is a complete monitoring tool, that needs to be configured in the best possible way to avoid a high overload to database servers.
Until next time!
Discover more about our expertise in SQL Server.
Interested in working with Vitor? Schedule a tech call.