Hello Kedro experts. We’re trying to evaluate how Kedro might fit into our data engineering processes as we deploy ML models for our customers. The nature of our work is such that we expect to deploy similar solutions across different customers who will have different environments. As such there are certain python scripts/packages that we’re expecting to want to port across different environments, as well as aspects of every deployment that we’ll expect to be custom. That probably means we want to have “nodes” in our data engineering pipelines that potentially run with a different set of package requirements as some of the ported code may have conflicting requirements. However, I believe a kedro pipeline typically requires the same requirements.txt to be used throughout. Is that right?
So the dependencies are simplest at a repo / project level, but people pipeline like a package with its own dependencies and build tooling around that concept
you can also maintain "pipeline specific dependencies" with this pattern
https://docs.kedro.org/en/stable/nodes_and_pipelines/modular_pipelines.html#providing-pipeline-specific-dependencies
it's not super fleshed out, but it's in there
it's unclear from the deprecation notice if this part is deprecated or just the sharing part , do you know?
hi , sadly micropackaging is deprecated. I've been collecting some thoughts on how we could break the 1 Kedro pipeline = 1 set of dependencies https://github.com/kedro-org/kedro/discussions/4147 still early stages, but any ideas are welcome
for now, your best bet is to have several Kedro projects
Ah, thanks for the information regardless. I’m honestly not sure what I would recommend for Kedro's design to solve this problem.
Internally our conversations are turning to making use of containers for whatever elements we want to be portable. Hypothetically, we would then orchestrate these containers with an orchestrator that’s set up for that (argo for example, but perhaps something supported for deployment in Kedro like Prefect).
If we wanted to, the customizations that we need to do could still happen in kedro, it would just all have to be architected such that the kedro pipeline would itself be a node within whatever orchestrator we choose? I’m not sure if there isn’t a way to “link up” the DAG that’s created from a Kedro pipeline to steps in a dag defined outside Kedro so long as it's the same overlying orchestrator tool?
FTR there's an unmaintained kedro-argo
plugin https://github.com/nraw/kedro-argo/ by
it would just all have to be architected such that the kedro pipeline would itself be a node within whatever orchestrator we choose?
several plugins adopt this grouping mechanism, for example kedro-airflow https://github.com/kedro-org/kedro-plugins/tree/main/kedro-airflow#can-i-group-nodes-together
Gotcha, thanks for the info. I always appreciate the info and responsiveness on this channel 🙂.