Google Cloud Next. My favorite annual event. Making new connections, reconnecting with those I haven’t seen for a year, and learning about the continuous innovation at Google generates so much energy inspiration. And despite the virtual format again this year,…
Read More >The problem When building data pipelines, it’s very common to require an external API call to enrich, validate or obfuscate data using external services. This might happen with streaming or batch pipeline. The situation is the same: call external services…
Read More >Introduction Google recently released a Kubernetes operator for Oracle, El Carro. The project includes good examples of how it works on GCP and on your local computer (using minikube). Given it is a portable implementation, we wanted to give it…
Read More >Raw incoming data needs to go through a series of data preparation steps before it can be used for analysis. These steps include tasks such as type casting, renaming columns, cleaning values and identifying duplicates. Writing code to perform these…
Read More >This post is part two of describing (near) real-time data processing for BigQuery. In this post, I will use Dataform to implement transforms as well as ASSERTS on the data and unit testing of BigQuery code and SQL statements. Part…
Read More >This post describes (near) real-time data processing for BigQuery with unique and other check constraints, and unit testing. This is part one of two, and describes the real-time ingestion of the data. Part two will describe how to implement ASSERTS…
Read More >Controlling Cloud Composer Costs and Performance Managing, optimizing and balancing cloud cost vs. performance is an ongoing challenge for all cloud architects and administrators. The variety and complexity of tools available can sometimes be daunting, so much so that many…
Read More >Editor’s Note: Because our bloggers have lots of useful tips, every now and then we update and bring forward a popular post from the past. Today’s post was originally published on August 15, 2019. In this post, I’ll describe a…
Read More >Editor’s Note: Because our bloggers have lots of useful tips, every now and then we update and bring forward a popular post from the past. Today’s post was originally published on November 26, 2019. It’s not uncommon these days for…
Read More >“We have many disparate data sources and we’re having a hard time getting a global view of all our data across our organization.” “Our data is currently all in <enter data warehouse name here> and we want to migrate it…
Read More >