Join the Kedro community

A
M
M
M
D

Choosing a Simple and Free Orchestrator for Kedro Pipelines on AWS

Hello Team!
So it's been a few months since we started using kedro and it's time to deploy some of the pipelines we have created.
We need to choose an orchestrator but this is not our field of expertise, so I wanted to ask for some help. We would like something simple to setup and use collaboratively. Also my company requires it is free (at least for now), our cloud provider is AWS and we already use mlflow. Here are the alternatives we found:

  • Prefect (open-source, seems nice to use, kedro support, but free tier imposes limitations)
  • Flyte (free?, open-source, seems nice to use, no kedro support)
  • MLRun (free and open-source, no kedro support? seems nice to use but a bit more than an orchestrator, requires python 3.9)
  • Kubeflow Pipelines (free and open-source, kedro plugin, and others seem to think it is complex to setup and maintain)
  • Airflow (free and open-source, kedro plugin)
  • Sagemaker (Amazon, kedro plugin, personally dislike its UI and how other AWS services are organized around it)

What would you recommend? What should we consider to make such a decision?

Thanks for your help :)

J
G
2 comments

hello !

disclaimer: I'm not an expert in any of these tools (alternative way of saying "it depends" 🙂)

when you talk about the limitations of the various free tiers, I guess those apply to the corresponding Cloud/Hosted option, right? taking the example of Prefect, to the best of my knowledge Prefect OSS doesn't have any limitations. the free tier of Prefect Cloud does, though (max of 5 000 runs/day). I guess something like that applies to Flyte vs Union, or Airflow vs Amazon MWAA (yes, AWS offers a managed Airflow service).

if you intend to <i>operate</i> the orchestrator yourself, then you're free to choose from the different OSS options. what do you want out of an orchestrator? given that your business logic (hence CPU-bound tasks) will live in the Kedro pipelines themselves, probably you'll want to pick a simple orchestrator that dispatches tasks, centralizes logs, displays execution status (and in my personal opinion, you don't need Kubernetes for that).

tl;dr: think carefully whether you want to operate your orchestrator yourself, or use some managed service.

then, you need to think <i>where</i> your pipelines will run. considering that your cloud provider is AWS, you'll be looking at Amazon EC2, Amazon ECS, etc. definitely <i>not</i> the same hardware where your orchestrator lives, otherwise you risk taking it down accidentally!

again taking Prefect as an example, looks like prefect-aws allows you to deploy your flows on ECS.

the most tried and tested orchestrator out there is Airflow, and there's an official Kedro plugin for it. but building Kedro translators isn't really a terribly difficult task, just see the code snippet in our docs that translates pipelines to Prefect for example.

I'll let others comment on their specific experiences 👂

Hi and thanks for your reply!

to the best of my knowledge Prefect OSS doesn't have any limitations. the free tier of Prefect Cloud does, though (max of 5 000 runs/day)
I missed that, thanks for pointing it out!

if you intend to <i>operate</i> the orchestrator yourself, then you're free to choose from the different OSS options. what do you want out of an orchestrator? given that your business logic (hence CPU-bound tasks) will live in the Kedro pipelines themselves, probably you'll want to pick a simple orchestrator that dispatches tasks, centralizes logs, displays execution status (and in my personal opinion, you don't need Kubernetes for that).
Indeed, I think this is what I need. Also, I wonder if it could allow me to:
  • Choose whether I run a set of node on a single machine or each node on a different machine (and if I am asking myself this question, does it mean I separated my pipelines/nodes wrongly?)
  • Choose the target machine for each node (i.e. some task I would like to run on a small/big EC2 instance or some others on a Dask cluster)

For now, I'll have a deeper look at prefect and airflow documentations, thanks! :)

Add a reply
Sign up and join the conversation on Slack
Join