Join the Kedro community

Updated 3 days ago

Accessing Kedro Configuration Within Marimo Notebook

Hi kedro community!! I have encountered an issue when working with kedro within a marimo notebook (I think the issue would be just the same in a jupyter notebook). Basically, I initially was working on my notebook by calling it from the command line from the kedro project root folder, something like: marimo edit notebooks/nb.py where my folder structure is something like:

β”œβ”€β”€ README.md
β”œβ”€β”€ conf
β”‚   β”œβ”€β”€ base
β”‚   β”œβ”€β”€ local
β”œβ”€β”€ data ...
β”œβ”€β”€ notebooks
β”‚   β”œβ”€β”€ nb.py
β”œβ”€β”€ pyproject.toml
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ src ... 
└── tests ...
Within nb.py I have a cell that runs:
from kedro.io import DataCatalog
from kedro.config import OmegaConfigLoader
from kedro.framework.project import settings
from pathlib import Path
conf_loader = OmegaConfigLoader(
    conf_source=Path(__file__).parent /settings.CONF_SOURCE,
    default_run_env = "base"
)

catalog = DataCatalog.from_config(conf_loader["catalog"], credentials=conf_loader["credentials"])

and later...
weekly_sales = pl.from_pandas(
    catalog.load("mytable")
)

The issue is, within the catalog all the filepaths are absolute and assume that wherever the catalog is being used from is using the Kedro project root level. the conf_source argument in the OmegaConfigLoader instance is an absolute path (e.g. conf/base/sql/somequery.sql or data/mydataset.csv so if I run my notebook from the root of my kedro project, all is fine but I were to run: cd notebooks; marimo edit nb.py then catalog.load will attempt to load the query or dataset from notebooks/conf/base/sql/somequery.sql

Is it clear?

PD: please don't ask me why there is SQL code within the conf folder πŸ˜…, it's moving soon

J
L
R
10 comments

hi @Luis Chaves Rodriguez! I think your message is incomplete? or otherwise could you clarify what the issue is? solved

Yes sorry, I pressed Enter by mistake as I was writing it, it's complete now, let me know if it's unclear @juanlu, the main issue is how the catalog defines the paths to the files that the catalog items are based on I believe

I see that the problem is solved in jupyter notebooks by using magic, but I wonder if there's a magic-free solution?

hi this a known issue, and looks like the solution for now was to improve our error messaging - https://github.com/kedro-org/kedro/issues/3248. Maybe you can raise this issue on github, and we can revisit.

but isn't this a solved issue in Jupyter? It should be possible to reproduce in other environments no? Couldn't we get the project root/session/context programmatically just like it happens with the magic?

the story of relative filepaths in the catalog is a bit tricky unfortunately. indeed, using the %load_ext kedro works, but there's not a good magic-free solution.

@Luis Chaves Rodriguez one thing you can try is to use runtime parameters. in your dataset:

ds:
  filepath: ${runtime_params:project_root}/data/01_raw/thing.csv

and then you can specify it as follows:

config_loader = OmegaConfigLoader(..., runtime_params={"project_root": Path(...).to_posix()})

the missing bit then is how to find the Path(...) to the project root.

https://docs.kedro.org/en/latest/configuration/advanced_configuration.html#how-to-override-configuration-with-[…]rameters-with-the-omegaconfigloader

does this make sense?

that makes sense, so every file, based on its location in the project would need to have a different Path(...) correct? Would the catalog.load respect that?

In my example, would it be the following?

conf_loader = OmegaConfigLoader(
    ...,
    default_run_env = "base",
    runtime_params = {"project_root": Path(__file__).parent }
)

If you had to start from scratch how would you fix this? How do other similar projects approach this?

catalog.load will respect it because it will know nothing about it. you’ll instantiate the catalog from the config loader. the translation happens at that step.

so it’s a matter of properly prefixing your file paths in the catalog and then instantiating the config loader with the right runtime_params. you can probably wrap that in a function if you’re using it more than once

What about this?

If you had to start from scratch how would you fix this? How do other similar projects approach this?

Add a reply
Sign up and join the conversation on Slack