Caching Alternatives in Google Dataflow: Avoiding Quota Limits and Improving Performance

The problem When building data pipelines, it’s very common to require an external API call to enrich, validate or obfuscate data using external services. This might happen with streaming or batch pipeline. The situation is the same: call external services…

Read More >

Using El Carro Operator on AWS

Introduction Google recently released a Kubernetes operator for Oracle, El Carro. The project includes good examples of how it works on GCP and on your local computer (using minikube). Given it is a portable implementation, we wanted to give it…

Read More >

Data preparation with dbt and BigQuery

Raw incoming data needs to go through a series of data preparation steps before it can be used for analysis. These steps include tasks such as type casting, renaming columns, cleaning values and identifying duplicates. Writing code to perform these…

Read More >

Near Real-Time Data Processing for BigQuery: Part Two

This post is part two of describing (near) real-time data processing for BigQuery. In this post, I will use Dataform to implement transforms as well as ASSERTS on the data and unit testing of BigQuery code and SQL statements. Part…

Read More >

Near Real-Time Data Processing for BigQuery: Part One

This post describes (near) real-time data processing for BigQuery with unique and other check constraints, and unit testing. This is part one of two, and describes the real-time ingestion of the data. Part two will describe how to implement ASSERTS…

Read More >

Google Cloud Composer Costs and Performance

Controlling Cloud Composer Costs and Performance Managing, optimizing and balancing cloud cost vs. performance is an ongoing challenge for all cloud architects and administrators. The variety and complexity of tools available can sometimes be daunting, so much so that many…

Read More >

How to Deploy Machine Learning on Google Cloud Platform

Editor’s Note: Because our bloggers have lots of useful tips, every now and then we update and bring forward a popular post from the past. Today’s post was originally published on August 15, 2019. In this post, I’ll describe a…

Read More >

Scaling ProxySQL Rapidly in Kubernetes

Editor’s Note: Because our bloggers have lots of useful tips, every now and then we update and bring forward a popular post from the past. Today’s post was originally published on November 26, 2019. It’s not uncommon these days for…

Read More >

Dipping Your Toes Into Building an Analytics Platform on Google Cloud Platform

“We have many disparate data sources and we’re having a hard time getting a global view of all our data across our organization.” “Our data is currently all in <enter data warehouse name here> and we want to migrate it…

Read More >

How to Connect from Cloud Functions to the Private IP Address of Cloud SQL in Google Cloud

Cloud functions allow you to run single-purpose functions without having to manage instances in Google Cloud. Cloud SQL is Google Cloud’s managed SQL service. For better security, it’s best practice to disable public IP in Cloud SQL. In terraform, the…

Read More >
Page 1 of 512345