Join the Kedro community

Updated 2 weeks ago

How Do People Use Kedro at Scale?

Hey, how do people use kedro at scale? I've read a few tutorials on how to use kedro for single projects but none on how to use it at scale. To me there would be an inherit benefit in creating modules with the pipeline step logics (so like shared nodes.py) and for common tasks using those rather than writing them in the pipeline specific nodes.py, does anybody do this?

I am keen to learn how people make the most out of kedro

N
L
4 comments

Hey @Luis Chaves Rodriguez, are you talking about scaling in terms of computation or reusing code (I think it's the latter)?

In this case, high level idea is that you should create common modules (or Python pacakge) if it is used across multiple pipeline.

Generally you want to reuse either a pipeline or the underlying function, it's less common to import a node since the context of a node depends on the catalog/pipeline itself.

Yes, it was the latter! And I see that your answer is aligned with that I was thinking of. As a standard do you tend to develop these packages alongside the kedro pipelines (same repo) or separately or it depends?

Usually it starts as a module along with the pipelines, if it continues to grow and useful enough you could consider breaking it out as a separate package.

The consideration there are similar to non pipeline libraries, it adds overhead of maintenance but could be necessary if that common modules are shared across different teams and need to be versioned properly

Add a reply
Sign up and join the conversation on Slack