Preparing for the Snowflake SnowPro Advanced: Architect Certification can be challenging. You need to have a deep understanding of the core concepts as well as specific feature capabilities. Snowflake’s excellent documentation provides detailed information. However, it is difficult to determine…
Read More >Apache Kafka and Apache Flink are popular platforms for data streaming applications. However, provisioning and managing your own clusters can be challenging and incur operational overhead. Amazon Web Services (AWS) provides a fully managed, highly available version of these platforms…
Read More >Previously, we discussed metadata and how it has become the connective glue in modern data architectures that allows different technologies to have common layers of reference for process and access automation. Centralized metadata storage through common data catalogs and features stores…
Read More >Having worked in the Data & Analytics (D&A) space for decades, the struggle to gain business insights through data is constant. I’ve used many approaches; some have worked, while others looked better on paper. Over time, however, the path to…
Read More >In our previous discussion, we explored the role of data stewards and their vital function for data governance programs. They’re the champions that identify data quality shortfalls and work with business partners to improve data quality. Data stewards are the…
Read More >Most corporations have huge amounts of data in RDBMS (relational database management system). When considering a RDBMS data transfer and you only need a subset of data to migrate to the cloud, follow this very efficient and easy data ingestion…
Read More >In part 1, we defined and deployed two data services to Cloud Run. Each service provides endpoints that perform specific tasks, such as loading a file to BigQuery or running dbt models. In this post, we’ll define and deploy some…
Read More >In my previous post I showed you how to use dbt to expedite data preparation tasks on Google BigQuery. This time, I’ll show you how to integrate those dbt pipelines into workflows that load, validate and transform data. …
Read More >Here at Pythian, we love our data. Our code is no exception (pun sort of intended), so I’ll be covering dataclasses in Python today. The problem As a Python developer, you’ve almost certainly run into code that looks like the…
Read More >The problem When building data pipelines, it’s very common to require an external API call to enrich, validate or obfuscate data using external services. This might happen with streaming or batch pipeline. The situation is the same: call external services…
Read More >