Global Analytics with Azure Cosmos Db and Synapse Analytics – SQL On The Edge Episode 21

A few months ago, Microsoft revealed that they were looking into adding a capability of querying Cosmos Db data through Spark and this immediately got me thinking into the new scenarios this would enable. The most ambitious is the capability…

Read More >

The Shift in Top Big Data Analytics Trends for 2020

The Shift in Top Big Data Analytics Trends as We Enter 2020

In 2019, we forecasted and highlighted the top trends in big data analytics. Today we will be revisiting this topic to explore how the trends have progressed as time has evolved. As we dive deeper into the digital age, the…

Read More >

How to install Vertica for test purposes

The column-oriented Vertica Analytics Database Platform was designed to manage large, fast-growing volumes of data and provide very fast query performance when used for data warehouses and other query-intensive applications. The following instructions show how to create a Vertica database…

Read More >

Examining Teradata To Google BigQuery Migration

Cloud migration is hot nowadays. Enterprises are considering options to migrate on-premises data and applications to cloud (AWS/GCP/Azure) to get the benefits of quick deployments, pay-per-use models and flexibility. Recently, I got a chance to work on data migration from…

Read More >

How to deploy machine learning on Google Cloud Platform

In this blog post, I will describe a few takeaways on how to deploy or submit Machine Learning (ML) tasks on Google Cloud Platform (GCP). If you have less experience as a ML engineer or if you are a solution…

Read More >

How to Implement Airflow Best Practices from a Data Scientist’s perspective – Part 1

This blog post is a compilation of suggestions for best practices drawn from my personal experience as a data scientist building Airflow DAGs and installing and maintaining Airflow. Let’s begin by explaining what Airflow is and what it is not….

Read More >

Reviewing the operation modes of Oracle GoldenGate BigQuery Handler

GoldenGate for Big Data 12.3.2.1.1 introduces a new target – Google BigQuery. BigQuery handler can work in two Audit log modes: 1. auditLogMode = true 2. auditLogMode = false I want to review the differences between these two operation modes…

Read More >

How to schedule weekdays only on Airflow

Consider the following situation: You have a data ingestion pipeline where the data comes in real-time on weekdays and is stored in a dated folder.  The day’s data needs to be ingested within four hours. An instant response may be…

Read More >

Analyzing BigQuery via Excel and Google Sheets

Both MS Excel and Google Sheets offer ways to connect directly to BQ data, to run queries, to pull data back to Excel/Sheets and allow further analysis via options such as pivot tables, charts and drilling up/down. MS Excel The…

Read More >

Data modeling for cloud DW

In this blog post, I would like to share some options that you can consider to model your cloud DW for better query performance.  With a traditional EDW, we would either come up a STAR, Snowflake or similar schemas. These…

Read More >
Page 1 of 1112345...10...Last Page »