Would kedro users be opposed defining nodes with decorators? I have written a simple implementation but as I've only recently started using kedro I wonder if I'm missing anything:
The syntax would be:
from kedro.pipeline import Pipeline, node, pipeline @node(inputs=1, outputs="first_sum") def step1(number): return number + 1 @node(inputs="first_sum", outputs="second_sum") def step2(number): return number + 1 @node(inputs="second_sum", outputs="final_result") def step3(number): return number + 2 pipeline = pipeline( [ step1, step2, step3, ] )
the functions could even be decorated in the project nodes.py and then pipeline definition would just become:
from kedro.pipeline import Pipeline, node, pipeline from .nodes import step1,step2,step3 pipeline = pipeline( [ step1, step2, step3, ] )
Personally I don't strongly against it, but to me it's mostly a syntax sugar. It will make simple things simpler but the complex cases more difficult. That's the main tradeoff
What complex cases would this not cover though?You get into funny situations where different decorators would conflict, for example combining this with Pandera might be painful
say you want to reuse the function -> now you can't because it's a node with pipeline specific details
similar case when you want to unit test, you will most likely want to test the node function rather than the node
It's easy to extract the node function from node for test, but arguably add more complexity
I don't think it's an issue with reusability of functions, I replied to this concern in the github thread: https://github.com/kedro-org/kedro/issues/2471#issuecomment-2598338855
But I hear the concern around having multiples ways of doing one thing and that confusing users
And I hear also the concern about clashes when stacking decorators @datajoely I'm not sure how easy it is to circumvent that
I agree with this:
It will make simple things simpler but the complex cases more difficult. That's the main tradeoff
I personally like to move all of my functions outside of my kedro project into their own well tests python package which is available on a private PyPI like repo. This is also means my flow logic isn't coupled to my business logic and say I needed to swap Kedro for something else it wont be a pain
Fair enough! Just wanted to see what the community thought of it, thanks for the insights @Nok @datajoely
so in summary I think our current pattern enables high cohesion but low coupling between Kedro's framework and your business logic
The approach you have taken it's slightly different, so you keep the function but have a separate thin node wrapper for what you call "step"
@node(inputs=["a", "b"], outputs="sum") def pipeline_step(a, b): return reusable_fn(a, b)
node(reuable_fn, "a", "b", outputs="sum"
? It's a few more keystroke, though maybe slightly clearer since the arguments are highlight at the topCurious, where do you store your private pypi repos? At my old work we had artifactory but I'm not sure what folks used out there
^don't be discourage if you find this works for you, fundamentally there is nothing wrong IMO. We aim to serve the broad audience so we try to keep this simple
Is this simpler than node(reuable_fn, "a", "b", outputs="sum"
? It's a few more keystroke, though maybe slightly clearer since the arguments are highlight at the top
@Nok yeah I'm not really sure which one is simpler at that point hence agreeing with your point that it complicates the "advanced" use casesBit late to the party, but I would suggest not binding input and output names to the node
. Instead, I think you can get something halfway like:
@node def step1(number): return number + 1 @node def step2(number): return number + 1 @node def step3(number): return number + 2 @pipeline def my_pipe(my_input): first_sum = step1(something) second_sum = step2(first_sum) final_result = step3(second_sum) return final_result my_pipe(1)
Also, on a separate note, I think the desire for this alternative syntax does come up, and it would be interesting to see some community-driven package that enables this syntax, + realistic examples + understanding the caveats. π It's hard to evaluate how well something like this could work without actually doing it, but it's a pretty big risk to take in core Kedro. π
I implemented something like this at work a while ago, but never really used it. I think a better solution to the issue I was trying to solve would be a kedro vscode plugin that can better show all the pipelines, nodes, catalog entries