Backstory: Why we dropped KubeFlow
You can read more about why we were less than impressed with Kubeflow in our previous post. In short, we’re building a reference architecture for machine learning projects. When it’s done, it’lll be a collection of our favorite machine learning tools, plus documentation and Terraform scripts to help you set up your full machine learning project quickly and easily.
Our initial plan was to build this partly on top of Kubeflow, but that tool had certain shortcomings, so we decided to drop it. Now we’re using Prefect.io instead, and we’re loving it.
Why we chose Prefect
We’ve previously compared several other workflow orchestration tools, and we’ve used Luigi for many of our existing projects. Unfortunately, Luigi doesn’t play well with Kubernetes – and although we’d patched this gap with some custom code in our older projects, we wanted Kubernetes to be a centerpiece in our new reference architecture. So we needed a tool with native Kubernetes support.
We’re also big Dask users. Prefect is built on top of Dask, and they share some core contributors, so we were confident in Prefect from the start. We knew these Dask foundations would lead to a stable core and a strong community – neither of which we found with Kubeflow.
Finally, we were attracted to Prefect because it’s familiar to Python engineers. It addresses many of the pain points common to more complicated tools like Airflow. Specifically, Prefect lets you turn any Python function into a task using a simple Python decorator.
By contrast, platforms like Airflow use more verbose, tightly constrained tasks. If you want to build tasks in Airflow, you have to learn “the Airflow way.” To build tasks in Prefect, you can simply write Python.
Prefect’s own “Why Prefect” article provides a few more compelling reasons to use Prefect:
- Tasks are functions – We didn’t need to learn another way of doing things because any Python function can be a task.
- Details matter – Prefect focuses on high-quality software, with impressive unit tests and documentation.
- Modularity allows for adaptation – Each Prefect component is very well defined, so we were confident we could swap them out for other tools if we ever wanted to.
Where does Prefect fit in?
For any machine learning solution, the algorithm and code related to training the model is only a small part. Managing workflows and dataflows is an essential – but less hyped – component of any ML production solution. That’s where Prefect comes in.
We use Prefect to pull data from the source, transform it as necessary (Prefect’s ETL flows are very neat and intuitive to build), and monitor any jobs that need to be run.
With standard machine learning architecture, Prefect will take over all the dataflow automation requirements. It will also integrate with libraries like Dask to handle distributed computing seamlessly.
A detailed look at our Prefect set-up
Prefect is made up of several components, and there are lots of ways to use it. We’re using it only as open source software. This means setting up our own Prefect server on AWS and not depending on Prefect Cloud.
Our entire reference architecture has to be fully free (as in freedom and beer), which means our set-up won’t even require users to have a Prefect.io account.
More specifically, we’re using Helm to set up Prefect in our Kubernetes cluster. Helm creates several of the components Prefect depends on, including:
- Prefect Agent, which handles task orchestration and resource allocation.
- Prefect UI, which provides a nice user interface and a dashboard overview.
- Apollo, GraphQL, and Hasura, which Prefect uses to serve the API.
- Postgres, which Prefect uses to store metadata on flows.
The Helm charts for the Prefect server are still in the experimental phase, but once again the Prefect community went beyond the call of duty to help us with a few teething problems.
Zooming out a bit, this means Prefect lives in our Kubernetes cluster and handles all of our workflow and dataflow management.
Our Kubernetes cluster also houses other machine learning services, notably:
- JupyterHub, which we use as our experimentation hub to do rapid prototyping.
- Dask, which we use for distributed computing.
This set-up lets our engineers focus on solving difficult problems without worrying about infrastructure. They can set up an experiment as a Prefect task in a Jupyter Notebook and submit it to Dask for processing – all within a familiar notebook interface.
Prefect does a great job of tracking what’s running where and presenting helpful logs and visualizations when things go wrong.
Our experience so far
We’ve already set up Prefect in our Kubernetes cluster and tested it on some of our pipelines. It’s all been a great experience so far. The only hurdles we encountered were related to certain Prefect Core components which assume you’re using Prefect Cloud (the proprietary, enterprise part of Prefect).
Because we’re running our own Prefect Server (instead of Prefect Cloud), we have zero dependencies on Prefect as a third-party service. The Prefect community was exceptionally helpful with this: we got responses to our questions on their Slack group within minutes. They also resolved and deployed a fix for a bug we reported within hours.
Within a couple of days, Prefect also merged and deployed our improvements to their documentation on deploying Prefect with HELM.
Now our Jupyter Notebooks are integrated with Prefect. We love how easy it was to set this up, and the clean UI and dashboards make it a pleasure to use.
Need help creating a scalable machine learning pipeline?
We love using new ML tools – and we’ve tried most of them. If you need help setting up your own scalable machine learning pipeline, reach out for a chat. You should also take a look at our Open MLOps repository - it automatically sets up Prefect and the other open source tools we use in a production-ready environment.