Caching Alternatives in Google Dataflow: Avoiding Quota Limits and Improving Performance

The problem When building data pipelines, it’s very common to require an external API call to enrich, validate or obfuscate data using external services. This might happen with streaming or batch pipeline. The situation is the same: call external services…

Read More >

Data preparation with dbt and BigQuery

Raw incoming data needs to go through a series of data preparation steps before it can be used for analysis. These steps include tasks such as type casting, renaming columns, cleaning values and identifying duplicates. Writing code to perform these…

Read More >

Snowflake System Function Error: Argument 0 to Function SYSTEM$PIPE_STATUS Needs to Be Constant

I recently encountered the above issue which prompted me to write this blog post so I can easily reference the solution whenever I need it. However, I also hope it might help anyone out there who hits a similar issue….

Read More >

Replicating MySQL to Snowflake with Kafka and Debezium—Part Two: Data Ingestion

Here we go again Hello, and welcome to this second part of my “Replicating MySQL to Snowflake” series. If you landed here from a web search and missed part one, you can take a look here: part one. What’s up?…

Read More >

How to Deploy Machine Learning on Google Cloud Platform

Editor’s Note: Because our bloggers have lots of useful tips, every now and then we update and bring forward a popular post from the past. Today’s post was originally published on August 15, 2019. In this post, I’ll describe a…

Read More >

How to Implement Airflow Best Practices From a Data Scientist’s Perspective

Editor’s Note: Because our bloggers have lots of useful tips, every now and then we update and bring forward a popular post from the past. Today’s post was originally published on August 8, 2019. This blog post is a compilation…

Read More >

Dipping Your Toes Into Building an Analytics Platform on Google Cloud Platform

“We have many disparate data sources and we’re having a hard time getting a global view of all our data across our organization.” “Our data is currently all in <enter data warehouse name here> and we want to migrate it…

Read More >

Global Analytics with Azure Cosmos Db and Synapse Analytics – SQL On The Edge Episode 21

A few months ago, Microsoft revealed that they were looking into adding a capability of querying Cosmos Db data through Spark and this immediately got me thinking into the new scenarios this would enable. The most ambitious is the capability…

Read More >

The Shift in Top Big Data Analytics Trends for 2020

The Shift in Top Big Data Analytics Trends as We Enter 2020

In 2019, we forecasted and highlighted the top trends in big data analytics. Today we will be revisiting this topic to explore how the trends have progressed as time has evolved. As we dive deeper into the digital age, the…

Read More >

How to install Vertica for test purposes

The column-oriented Vertica Analytics Database Platform was designed to manage large, fast-growing volumes of data and provide very fast query performance when used for data warehouses and other query-intensive applications. The following instructions show how to create a Vertica database…

Read More >
Page 2 of 1212345...10...Last Page »