Discussing Data Lineage– Its Definition, Use, and Value

data lineage

Previously, we discussed metadata and how it has become the connective glue in modern data architectures that allows different technologies to have common layers of reference for process and access automation. Centralized metadata storage through common data catalogs and features stores…

Read More >

Why Achieving Quick Business Wins Should Be Built Into Your D&A

data & analytics

Having worked in the Data & Analytics (D&A) space for decades, the struggle to gain business insights through data is constant. I’ve used many approaches; some have worked, while others looked better on paper. Over time, however, the path to…

Read More >

Metadata-Driven Data Governance: the How and Why


In our previous discussion, we explored the role of data stewards and their vital function for data governance programs. They’re the champions that identify data quality shortfalls and work with business partners to improve data quality. Data stewards are the…

Read More >

Migrate RDB to Cloud SQL Using Google’s Dataflow

rdbms data transfer

Most corporations have huge amounts of data in RDBMS (relational database management system). When considering a RDBMS data transfer and you only need a subset of data to migrate to the cloud, follow this very efficient and easy data ingestion…

Read More >

Orchestrating dbt Pipelines with Google Cloud: Part 2

dbt pipelines

In part 1, we defined and deployed two data services to Cloud Run. Each service provides endpoints that perform specific tasks, such as loading a file to BigQuery or running dbt models. In this post, we’ll define and deploy some…

Read More >

Orchestrating dbt Pipelines With Google Cloud: Part 1


In my previous post I showed you how to use dbt to expedite data preparation tasks on Google BigQuery. This time, I’ll show you how to integrate those dbt pipelines into workflows that load, validate and transform data.    …

Read More >

Python: Using Dataclasses to Model Your Data

3d chart data model tablet

Here at Pythian, we love our data. Our code is no exception (pun sort of intended), so I’ll be covering dataclasses in Python today. The problem As a Python developer, you’ve almost certainly run into code that looks like the…

Read More >

Caching Alternatives in Google Dataflow: Avoiding Quota Limits and Improving Performance

The problem When building data pipelines, it’s very common to require an external API call to enrich, validate or obfuscate data using external services. This might happen with streaming or batch pipeline. The situation is the same: call external services…

Read More >

Data preparation with dbt and BigQuery

Raw incoming data needs to go through a series of data preparation steps before it can be used for analysis. These steps include tasks such as type casting, renaming columns, cleaning values and identifying duplicates. Writing code to perform these…

Read More >

Snowflake System Function Error: Argument 0 to Function SYSTEM$PIPE_STATUS Needs to Be Constant

I recently encountered the above issue which prompted me to write this blog post so I can easily reference the solution whenever I need it. However, I also hope it might help anyone out there who hits a similar issue….

Read More >
Page 1 of 1212345...10...Last Page »