Building data tests in PowerShell

Posted in: Technical Track

While working with a client recently, we came across a problem while testing data for completeness or errors after running an ETL process to import & manipulate the data. The main issue we ran across was that the overall client process was being managed via PowerShell, and we wanted to save the success/fail messages for later processing.

Data tests in PowerShell

The solution we settled on has the following steps:

Define a PS DataTable to store results
We first create a DataTable in PowerShell for storing the results of the data tests.

$results = New-Object System.Data.DataTable
$sheetName = New-Object System.Data.DataColumn sheetName,([string]) # Name of worksheet these results will be stored in. 
$status = New-Object System.Data.DataColumn status,([string])
$output = New-Object System.Data.DataColumn output,([string])

# Define table columns

Define the test, pass/fail and status messages
Here, we define the data tests query, pass/fail threshold and what to do if the query passes or fails.

What happens within Get-ImportStatus is that the testQuery value is run and if the number of results is greater than or equal to the threshold, the data test fails. The errorQuery is then run to get the failed messages. If the test succeeds, the successQuery is run.

$testResults = Get-ImportStatus `
  -sqlConnection $sqlConnection `
  -testThreshold 0 `
  -testQuery "SELECT count(1) FROM [config].[IngestionProcessLog] WHERE [Error_Process] IS NOT NULL;" `
  -successMsg "No Errors encountered during processing:" `
  -successQuery "SELECT [Source_Process],[Process_Status],[Log_DateTime] FROM [config].[IngestionProcessLog] where [Error_Process] IS NULL;" `
  -errorMsg "The below Errors Have been recorded:" `
  -errorQuery "SELECT [Source_Process],[Process_Status],[Log_DateTime] FROM [config].[IngestionProcessLog] where [Error_Process] IS NOT NULL;"

Call a stored procedure to run the test and return results in CSV
Taking advantage of a stored procedure I found on Stack Overflow, we get the results in CSV format.

function Get-ImportStatus {
param (
    $testThreshold, # Test query number that the results are failed at. 
    $testQuery, # Should return a single number
#  This procedure runs the testQuery sent in and passes/fails the results based on the TestThreshold. 
#    Returns a hashtable of results

[hashtable]$row = @{}

$testStatus = Test-DataStatus -SQLConnection $sqlConnection -Query $testQuery # This just runs the SQL Query

if ($testStatus -gt $testThreshold) {
    $row.status = $errorMsg
    $row.output = Get-ImportData -SQLConnection $sqlConnection -Query $errorQuery
else {
    $row.status = $successMsg
    $row.output = Get-ImportData -SQLConnection $sqlConnection -Query $successQuery

return $row


Store results in the DataTable
The returned results are then stored in the DataTable for reporting or manipulating in any way we want.

$row = $results.NewRow()
$row.sheetName = "ProcessStatus"
$row.status = $testResults.status 
$row.output = $testResults.output

I hope that through this demonstration we can help others who may also encounter this problem.

Want to talk with an expert? Schedule a call with our team to get the conversation started.

No comments

Leave a Reply

Your email address will not be published. Required fields are marked *