Snowflake Feature Applicability At a Glance

Preparing for the Snowflake SnowPro Advanced: Architect Certification can be challenging. You need to have a deep understanding of the core concepts as well as specific feature capabilities. Snowflake’s excellent documentation provides detailed information. However, it is difficult to determine…

Read More >

Data Streaming with Kafka and Flink on AWS – Part 2

Apache Kafka and Apache Flink are popular platforms for data streaming applications. However, provisioning and managing your own clusters can be challenging and incur operational overhead. Amazon Web Services (AWS) provides a fully managed, highly available version of these platforms…

Read More >

Discussing Data Lineage– Its Definition, Use, and Value

data lineage

Previously, we discussed metadata and how it has become the connective glue in modern data architectures that allows different technologies to have common layers of reference for process and access automation. Centralized metadata storage through common data catalogs and features stores…

Read More >

Why Achieving Quick Business Wins Should Be Built Into Your D&A

data & analytics

Having worked in the Data & Analytics (D&A) space for decades, the struggle to gain business insights through data is constant. I’ve used many approaches; some have worked, while others looked better on paper. Over time, however, the path to…

Read More >

Metadata-Driven Data Governance: the How and Why

metadata

In our previous discussion, we explored the role of data stewards and their vital function for data governance programs. They’re the champions that identify data quality shortfalls and work with business partners to improve data quality. Data stewards are the…

Read More >

Migrate RDB to Cloud SQL Using Google’s Dataflow

rdbms data transfer

Most corporations have huge amounts of data in RDBMS (relational database management system). When considering a RDBMS data transfer and you only need a subset of data to migrate to the cloud, follow this very efficient and easy data ingestion…

Read More >

Orchestrating dbt Pipelines with Google Cloud: Part 2

dbt pipelines

In part 1, we defined and deployed two data services to Cloud Run. Each service provides endpoints that perform specific tasks, such as loading a file to BigQuery or running dbt models. In this post, we’ll define and deploy some…

Read More >

Orchestrating dbt Pipelines With Google Cloud: Part 1

orchestra

In my previous post I showed you how to use dbt to expedite data preparation tasks on Google BigQuery. This time, I’ll show you how to integrate those dbt pipelines into workflows that load, validate and transform data.    …

Read More >

Python: Using Dataclasses to Model Your Data

3d chart data model tablet

Here at Pythian, we love our data. Our code is no exception (pun sort of intended), so I’ll be covering dataclasses in Python today. The problem As a Python developer, you’ve almost certainly run into code that looks like the…

Read More >

Caching Alternatives in Google Dataflow: Avoiding Quota Limits and Improving Performance

The problem When building data pipelines, it’s very common to require an external API call to enrich, validate or obfuscate data using external services. This might happen with streaming or batch pipeline. The situation is the same: call external services…

Read More >
Page 1 of 1212345...10...Last Page »