Tag: airflow

How to deploy machine learning on Google Cloud Platform

In this blog post, I will describe a few takeaways on how to deploy or submit Machine Learning (ML) tasks on Google Cloud Platform (GCP). If you have less experience as a ML engineer or if you are a solution…

Read More >

How to Implement Airflow Best Practices from a Data Scientist’s perspective – Part 1

This blog post is a compilation of suggestions for best practices drawn from my personal experience as a data scientist building Airflow DAGs and installing and maintaining Airflow. Let’s begin by explaining what Airflow is and what it is not….

Read More >

How to schedule weekdays only on Airflow

Consider the following situation: You have a data ingestion pipeline where the data comes in real-time on weekdays and is stored in a dated folder.  The day’s data needs to be ingested within four hours. An instant response may be…

Read More >

Airflow Scheduler: the very basics

Apache Airflow is a great tool for scheduling jobs. It has a nice UI out of the box. It allows you to create a directed acyclic graph (DAG) of tasks and their dependencies. You can easily look at how the…

Read More >

Creating dynamic tasks using Apache Airflow

What’s Airflow? Apache Airflow is an open source scheduler built on Python. It uses a topological sorting mechanism, called a DAG (Directed Acyclic Graph) to generate dynamic tasks for execution according to dependency, schedule, dependency task completion, data partition and/or…

Read More >