Examining Teradata To Google BigQuery Migration

Cloud migration is hot nowadays. Enterprises are considering options to migrate on-premises data and applications to cloud (AWS/GCP/Azure) to get the benefits of quick deployments, pay-per-use models and flexibility. Recently, I got a chance to work on data migration from…

Read More >

How to deploy machine learning on Google Cloud Platform

In this blog post, I will describe a few takeaways on how to deploy or submit Machine Learning (ML) tasks on Google Cloud Platform (GCP). If you have less experience as a ML engineer or if you are a solution…

Read More >

How to Implement Airflow Best Practices from a Data Scientist’s perspective – Part 1

This blog post is a compilation of suggestions for best practices drawn from my personal experience as a data scientist building Airflow DAGs and installing and maintaining Airflow. Let’s begin by explaining what Airflow is and what it is not….

Read More >

Reviewing the operation modes of Oracle GoldenGate BigQuery Handler

GoldenGate for Big Data 12.3.2.1.1 introduces a new target – Google BigQuery. BigQuery handler can work in two Audit log modes: 1. auditLogMode = true 2. auditLogMode = false I want to review the differences between these two operation modes…

Read More >

How to schedule weekdays only on Airflow

Consider the following situation: You have a data ingestion pipeline where the data comes in real-time on weekdays and is stored in a dated folder.  The day’s data needs to be ingested within four hours. An instant response may be…

Read More >

Analyzing BigQuery via Excel and Google Sheets

Both MS Excel and Google Sheets offer ways to connect directly to BQ data, to run queries, to pull data back to Excel/Sheets and allow further analysis via options such as pivot tables, charts and drilling up/down. MS Excel The…

Read More >

Data modeling for cloud DW

In this blog post, I would like to share some options that you can consider to model your cloud DW for better query performance.  With a traditional EDW, we would either come up a STAR, Snowflake or similar schemas. These…

Read More >

Azure Data Lake basics for the SQL Server DBA / developer and… for everyone!

The basics If you’re a Microsoft SQL Server DBA or developer and have not been introduced to the Microsoft Azure Data Lake and would like to understand what it’s all about and how to get started, this article is for YOU….

Read More >

Big Data on Microsoft Azure – HDInsight

Introduction   The best definition you going to find for data is that data is the new oil in today’s world. Starting from that, we can define a new horizon and a new way of looking at how we treat…

Read More >

Scheduling Google Cloud Functions

Currently, there is no straightforward way to schedule Google Cloud Functions. It is still possible to achieve this by different means, such as (but not limited to): deploying Compute Engine instance and setting crontab entry configuring HTTP/S uptime checks via…

Read More >
Page 1 of 1012345...10...Last Page »