The problem When building data pipelines, it’s very common to require an external API call to enrich, validate or obfuscate data using external services. This might happen with streaming or batch pipeline. The situation is the same: call external services…
Read More >Raw incoming data needs to go through a series of data preparation steps before it can be used for analysis. These steps include tasks such as type casting, renaming columns, cleaning values and identifying duplicates. Writing code to perform these…
Read More >I recently encountered the above issue which prompted me to write this blog post so I can easily reference the solution whenever I need it. However, I also hope it might help anyone out there who hits a similar issue….
Read More >Here we go again Hello, and welcome to this second part of my “Replicating MySQL to Snowflake” series. If you landed here from a web search and missed part one, you can take a look here: part one. What’s up?…
Read More >Editor’s Note: Because our bloggers have lots of useful tips, every now and then we update and bring forward a popular post from the past. Today’s post was originally published on August 15, 2019. In this post, I’ll describe a…
Read More >Editor’s Note: Because our bloggers have lots of useful tips, every now and then we update and bring forward a popular post from the past. Today’s post was originally published on August 8, 2019. This blog post is a compilation…
Read More >“We have many disparate data sources and we’re having a hard time getting a global view of all our data across our organization.” “Our data is currently all in <enter data warehouse name here> and we want to migrate it…
Read More >A few months ago, Microsoft revealed that they were looking into adding a capability of querying Cosmos Db data through Spark and this immediately got me thinking into the new scenarios this would enable. The most ambitious is the capability…
Read More >In 2019, we forecasted and highlighted the top trends in big data analytics. Today we will be revisiting this topic to explore how the trends have progressed as time has evolved. As we dive deeper into the digital age, the…
Read More >The column-oriented Vertica Analytics Database Platform was designed to manage large, fast-growing volumes of data and provide very fast query performance when used for data warehouses and other query-intensive applications. The following instructions show how to create a Vertica database…
Read More >