Would kedro users be opposed defining nodes with decorators? I have written a simple implementation but as I've only recently started using kedro I wonder if I'm missing anything:
The syntax would be:
from kedro.pipeline import Pipeline, node, pipeline @node(inputs=1, outputs="first_sum") def step1(number): return number + 1 @node(inputs="first_sum", outputs="second_sum") def step2(number): return number + 1 @node(inputs="second_sum", outputs="final_result") def step3(number): return number + 2 pipeline = pipeline( [ step1, step2, step3, ] )
How do you avoid over DRY ("Don't Repeat Yourself") using Kedro? I find given the fairly opinionated syntax and project structure that is proprosed it's easy to DRY bits of code that would be best not DRY (e.g. preprocessing code). I wonder if anyone else has had similar thoughts
Hi kedro community!! I have encountered an issue when working with kedro within a marimo notebook (I think the issue would be just the same in a jupyter notebook). Basically, I initially was working on my notebook by calling it from the command line from the kedro project root folder, something like: marimo edit notebooks/nb.py
where my folder structure is something like:
├── README.md ├── conf │ ├── base │ ├── local ├── data ... ├── notebooks │ ├── nb.py ├── pyproject.toml ├── requirements.txt ├── src ... └── tests ...Within
nb.py
I have a cell that runs:from kedro.io import DataCatalog from kedro.config import OmegaConfigLoader from kedro.framework.project import settings from pathlib import Path conf_loader = OmegaConfigLoader( conf_source=Path(__file__).parent /settings.CONF_SOURCE, default_run_env = "base" ) catalog = DataCatalog.from_config(conf_loader["catalog"], credentials=conf_loader["credentials"])
weekly_sales = pl.from_pandas( catalog.load("mytable") )
catalog
all the filepaths are absolute and assume that wherever the catalog is being used from is using the Kedro project root level. the conf_source
argument in the OmegaConfigLoader
instance is an absolute path (e.g. conf/base/sql/somequery.sql
or data/mydataset.csv
so if I run my notebook from the root of my kedro project, all is fine but I were to run: cd notebooks; marimo edit nb.py
then catalog.load
will attempt to load the query or dataset from notebooks/conf/base/sql/somequery.sql
Hey, how do people use kedro at scale? I've read a few tutorials on how to use kedro for single projects but none on how to use it at scale. To me there would be an inherit benefit in creating modules with the pipeline step logics (so like shared nodes.py) and for common tasks using those rather than writing them in the pipeline specific nodes.py, does anybody do this?
I am keen to learn how people make the most out of kedro