Join the Kedro community

M
M
M
D
M

Kedro orchestration service for production environments

Hey Everyone

Interested to know from you people which orchestration service you guys prefer to run kedro in production environments and how has been the experience so far

Recently I have been trying to run kedro on kubeflow and have been facing multiple issues.

2
J
m
G
11 comments

Hi , what issues are you facing?

Don't use Kubeflow 😄

Are you using any specific cloud provider?

I am curious. Is Kubeflow especially a bad idea or? 😅

computers as a whole were a bad idea

depends on your use case, but for A LOT of use cases it's a total overkill. It's especially difficult to install and maintain. If you treat it as a MLOps Platform and you plan to utilize as much of the features as possible (Katib/KServe/Notebooks/Pipelines) etc., maybe you will benefit from it (provided that you have manpower to maintain it). For orchestration only - there are better options.

Thanks for responding to my post. Sorry for late response from my end.

Couple of issues we are facing when I am trying to deploy kedro pipeline on kubeflow

  1. Our Pipeline needs access to file systems like EFS to perform IO. It is becoming really difficult to mount and use an EFS within the pods running kedro nodes. Although the kedro-kubeflow plugin does provide ways to mount volumes but we are facing some hard time with that. I have reached out to our kubernetes team as i felt I am struggling with kubeflow configuration generated by plugin. I will keep you guys posted once we resolve the issue. and has been a great help to me on this.
  2. Accessing various services like S3 on AWS requires some kind of authentication from the application. I have been storing the credentials in .env during local development and using resolvers to make use of credentials in catalog. Now the pipeline needs to run in kubeflow service which eventually launches pods on EKS cluster , we are trying to find efficient ways of accessing AWS services . As far as I could research there are many ways to deal with that , few of them that I am aware of -
  1. Deploy Secret component on EKS which will have aws credentials and this can help to refer these secret variables in our kedro pods. But the plugin configuration doesn't allow us to use any such kubernetes component.
  2. IRSA - IAM Roles for Service accounts mentioned by . This looks the most promising way, I am in touch with our devops team to work it out. But a quick question to do we need to mention this service account somewhere in the plugin configuration ? As i see the compiled argoworkflow does mentions a serviceaccount, I am not sure from where it is coming .
  1. We also wanted to use many other command line arguments supported by kedro like --async and others but there doesn't exists any option to configure these in plugin configuration(kubeflow.yaml). Once we compile our pipeline using the plugin compile command I see an argoworkflow(pipeline.yml) is generated which is uploaded on kubeflow actually, The steps in this file does mention the kedro run command and we should be able to manage. But then this will require us to change the argoworkflow directly and maintain it and we might not even need kubeflow plugin I feel.
  2. We have built some pipelines natively using the kubeflow sdk and we found it lot easier to plug and play with pipelines using the sdk. For some reason using a plugin in between makes us feel we are unable to utilise many good things that kubeflow provides out of the box.
  3. For your question about any specific cloud provider - Yes, we use AWS, we are running EKS and a self managed kubeflow Service in the cluster.

1. Are your pipelines super I/O heavy? Was EFS introduced for efficiency or just for the ability to use ReadWriteMany in K8s?
2.b: I haven't worked with argo/kfp for a while, I don't remember exactly where it's configured, but it's definitely doable.
3. Sounds like something you could contribute to kedro-kubeflow then.
4. Kedro KFP plugin is not most up-to-date as of now. There are definitely some missing bits. The biggest gain from using Kedro-Kubeflow was to detach the work done by DS teams (they stick to Kedro and are able to run things locally / iterate fast) from the work done by MLE/MLOps - they provide infra and Kedro is a common pipeline'ing language to marry those 2 worlds together.

I'm coming back to the Kubeflow question. AFAIK, pipelines in GCP VertexAI are authored using the Kubeflow Pipelines SDK, right? so there isn't really a choice

kfp had 6.5M downloads last month so it definitely looks like a popular pipeline framework

Hi , Vertex AI uses kfp as a default framework, that’s true but regarding the integration with kedro, there’s a separate plugin for that - kedro-vertexai which is more up to date and uses one of the recent kfp versions.

Apart from the “pipeline translation” logic which works very similar to the kedro-kubeflow one there are differences regarding authentication, scheduling and parameter handling between vertex and standalone kubeflow clusters.

Add a reply
Sign up and join the conversation on Slack
Join