In this blog post, I will describe a few takeaways on how to deploy or submit Machine Learning (ML) tasks on Google Cloud Platform (GCP). If you have less experience as a ML engineer or if you are a solution architect, you might be in the right place.
What exactly is an ML task? Before building an ML model, you need first to specify what are you planning to accomplish with the data. Having this in mind helps you to identify what exactly the ML tasks are for your use case. Some broad ML tasks could be: data ingestion, feature transformation, supervised learning, semi-supervised learning, unsupervised learning, dimensionality reduction, active learning, reinforcement learning, and model prediction. This is too much material to cover in one blog post, so expect another blog post to cover all those tasks. In the meantime, you can find a general overview on Wikipedia.
There are many options to deploy or submit ML tasks on GCP:
1. Offline Tasks that take a long time using a pre-configured environment: Kubernetes, Dataproc, AI Platform Default.
In this case, there is less flexibility to change the environment to the way you want. For instance, ML Engine Default is only Python. In Dataproc you need to deploy a Hadoop cluster environment.
2. Offline tasks that take a long time using a flexible environment: AI Platform Custom Training and Cloud Build.
In this approach, you can use the isolation capability of Docker containers to set the environment requirements. Those approaches involve building a container and pushing it to a cloud container registry.
3. Online tasks that take a few minutes to run using a flexible environment: Cloud Run (Knative), App Engine.
Those two services are serverless, therefore you only need to worry about your task code. They support several programming languages and they are elastic, scaled from 0. That way, while the endpoint is not being consumed there are no incurred costs. However, you may find some limitations on scalability performance and security.
4. Online tasks that take seconds to run using a less flexible environment: Cloud Function.
This is a serverless approach, used to perform Lambda calculations.
5. Offline tasks that require a schedule to run taking any amount of time: Cloud Composer (Airflow), Cloud Scheduler.
In this case, we need a scheduling system to run our ML task. Cloud Composer uses Celery for that and Scheduler uses CronJobs.
6. Online tasks requiring triggering by events taking any amount of time: Cloud Build - through Triggers, Cloud Tasks,Airflow - REST api call to trigger DAG.
Cloud Build is not designed for ML tasks, but it can be used to run any kind of event-based automated tasks. Cloud Tasks implements an asynchronous queue of tasks that can target messages to a PubSub topic. After that, we can consume those messages and run different kinds of tasks.
7. Online tasks requiring triggering by events taking a short time by calling REST api: Cloud Run (Knative), REST API Kubernetes app, AI Platform Custom "Model Serving", Cloud Function, App Engine.
In this situation, you have plenty of options to implement the ML task. Choosing between the available options will depend on other requirements such as security, performance and costs.
8. Orchestrated tasks manually triggered that takes any amount of time: Cloud Build - through Triggers, Cloud Composer(Airflow), Kubeflow.
Here, you are implementing an orchestrated workflow that will perform tasks in sequence depending on conditions. Each task may take a different amount of time to finalize. A modern ML workflow / pipeline usually involves running containers in a third container orchestrator environment. On GCP it can be Kubernetes or AI Platform.
Note that we now have ML Engine Custom Containers for training and serving which is the only option where you can be careless about scalable cluster management and environment (Java, Python, Go, …).
Last point: Cloud Run (Knative) and AI Platform Custom are great additions since you use Docker containers and can be careless about the environment (serverless).
Carlos is a Machine Learning Engineer at Pythian. He attains Cloud Architect and Data Engineer Google Professional certifications. Also, he is AI/ML consultant to startups in Ottawa. He has previously architected, implemented, and tuned many relational transaction systems for large manufacturing companies like FCA and Philips. Nowadays, he builds and deploys ML system in production on Google Cloud Platform for large retailers and gaming companies on the Americas. He is actively organizing GDG Cloud Ottawa meetups, and mentored events like Cloud Relay, Cloud Spring and Cloud Study Jam.
PYTHIAN®, LOVE YOUR DATA®, and ADMINISCOPE® are trademarks and registered trademarks owned by Pythian in North America and certain other countries, and are valuable assets of our company. Other brands, product and company names on this website may be trademarks or registered trademarks of Pythian or of third parties. Use of trademarks without permission is strictly prohibited.