Tag: Google Cloud Platform (GCP)

Building an ETL Pipeline with Multiple External Data Sources in Cloud Data Fusion

etl pipeline

In this post, I’ll share a quick start guide on Google Cloud Platform’s (GCP) Cloud Fusion. We’ll first take a look at what this product offers, and we will also take a use case of building a data pipeline involving…

Read More >

Online Data Migration from SQL Server to Cloud Spanner Using Striim

striim

This post will focus on the implementation of a continuous migration from SQL Server to Cloud Spanner using Striim. It includes steps for configuring an initial load and a continuous data replication using change data capture. However, I don’t cover …

Read More >

Orchestrating dbt Pipelines with Google Cloud: Part 2

dbt pipelines

In part 1, we defined and deployed two data services to Cloud Run. Each service provides endpoints that perform specific tasks, such as loading a file to BigQuery or running dbt models. In this post, we’ll define and deploy some…

Read More >

Orchestrating dbt Pipelines With Google Cloud: Part 1

orchestra

In my previous post I showed you how to use dbt to expedite data preparation tasks on Google BigQuery. This time, I’ll show you how to integrate those dbt pipelines into workflows that load, validate and transform data.    …

Read More >

Top Three Use Cases for Google Cloud Spanner

Since its launch in 2017, Google Cloud Spanner has built a reputation that’s very different from traditional databases. Where old-school databases were known to be fragile, finicky and high-maintenance, Google Cloud Spanner delivered the seemingly impossible: a fully managed relational…

Read More >

GCP Professional Cloud Architect Certification Guide

Looking for how to prepare for the latest version of Google Cloud Professional Architect Certification?  Wondering if you are ready to give it a shot? If yes, then this post might help you ace your certification.  I’ll briefly summarize my…

Read More >

Consuming Tweets Using Apache Beam on Dataflow

Apache Beam is an SDK (software development kit) available for Java, Python, and Go that allows for a streamlined ETL programming experience for both batch and streaming jobs. It’s the SDK that GCP Dataflow jobs use and it comes with…

Read More >

Near Real-Time Data Processing for BigQuery: Part Two

This post is part two of describing (near) real-time data processing for BigQuery. In this post, I will use Dataform to implement transforms as well as ASSERTS on the data and unit testing of BigQuery code and SQL statements. Part…

Read More >

Near Real-Time Data Processing for BigQuery: Part One

This post describes (near) real-time data processing for BigQuery with unique and other check constraints, and unit testing. This is part one of two, and describes the real-time ingestion of the data. Part two will describe how to implement ASSERTS…

Read More >

How to Deploy Machine Learning on Google Cloud Platform

Editor’s Note: Because our bloggers have lots of useful tips, every now and then we update and bring forward a popular post from the past. Today’s post was originally published on August 15, 2019. In this post, I’ll describe a…

Read More >
Page 1 of 1112345...10...Last Page »