Accessing Kedro Configuration Within Marimo Notebook

At a glance

Hi kedro community!! I have encountered an issue when working with kedro within a marimo notebook (I think the issue would be just the same in a jupyter notebook). Basically, I initially was working on my notebook by calling it from the command line from the kedro project root folder, something like: marimo edit notebooks/nb.py where my folder structure is something like:

├── README.md
├── conf
│   ├── base
│   ├── local
├── data ...
├── notebooks
│   ├── nb.py
├── pyproject.toml
├── requirements.txt
├── src ... 
└── tests ...

Within nb.py I have a cell that runs:

from kedro.io import DataCatalog
from kedro.config import OmegaConfigLoader
from kedro.framework.project import settings
from pathlib import Path
conf_loader = OmegaConfigLoader(
    conf_source=Path(__file__).parent /settings.CONF_SOURCE,
    default_run_env = "base"
)

catalog = DataCatalog.from_config(conf_loader["catalog"], credentials=conf_loader["credentials"])

and later...

weekly_sales = pl.from_pandas(
    catalog.load("mytable")
)

The issue is, within the catalog all the filepaths are absolute and assume that wherever the catalog is being used from is using the Kedro project root level. the conf_source argument in the OmegaConfigLoader instance is an absolute path (e.g. conf/base/sql/somequery.sql or data/mydataset.csv so if I run my notebook from the root of my kedro project, all is fine but I were to run: cd notebooks; marimo edit nb.py then catalog.load will attempt to load the query or dataset from notebooks/conf/base/sql/somequery.sql

Is it clear?

PD: please don't ask me why there is SQL code within the conf folder 😅, it's moving soon

13 comments

JJuan Luis Cano Rodríguez

hi @Luis Chaves Rodriguez~~! I think your message is incomplete? or otherwise could you clarify what the issue is?~~ solved

LLuis Chaves Rodriguez

Yes sorry, I pressed Enter by mistake as I was writing it, it's complete now, let me know if it's unclear @juanlu, the main issue is how the catalog defines the paths to the files that the catalog items are based on I believe

LLuis Chaves Rodriguez

I see that the problem is solved in jupyter notebooks by using magic, but I wonder if there's a magic-free solution?

LLuis Chaves Rodriguez

could this be relevant? https://docs.kedro.org/en/stable/_modules/kedro/ipython.html#magic_reload_kedro

RRashida Kanchwala

hi this a known issue, and looks like the solution for now was to improve our error messaging - https://github.com/kedro-org/kedro/issues/3248. Maybe you can raise this issue on github, and we can revisit.

LLuis Chaves Rodriguez

but isn't this a solved issue in Jupyter? It should be possible to reproduce in other environments no? Couldn't we get the project root/session/context programmatically just like it happens with the magic?

JJuan Luis Cano Rodríguez

the story of relative filepaths in the catalog is a bit tricky unfortunately. indeed, using the %load_ext kedro works, but there's not a good magic-free solution.

@Luis Chaves Rodriguez one thing you can try is to use runtime parameters. in your dataset:

ds:
  filepath: ${runtime_params:project_root}/data/01_raw/thing.csv

and then you can specify it as follows:

config_loader = OmegaConfigLoader(..., runtime_params={"project_root": Path(...).to_posix()})

the missing bit then is how to find the Path(...) to the project root.

https://docs.kedro.org/en/latest/configuration/advanced_configuration.html#how-to-override-configuration-with-[…]rameters-with-the-omegaconfigloader

does this make sense?

LLuis Chaves Rodriguez

that makes sense, so every file, based on its location in the project would need to have a different Path(...) correct? Would the catalog.load respect that?

In my example, would it be the following?

conf_loader = OmegaConfigLoader(
    ...,
    default_run_env = "base",
    runtime_params = {"project_root": Path(__file__).parent }
)

If you had to start from scratch how would you fix this? How do other similar projects approach this?

JJuan Luis Cano Rodríguez

catalog.load will respect it because it will know nothing about it. you’ll instantiate the catalog from the config loader. the translation happens at that step.

so it’s a matter of properly prefixing your file paths in the catalog and then instantiating the config loader with the right runtime_params. you can probably wrap that in a function if you’re using it more than once

LLuis Chaves Rodriguez

What about this?

If you had to start from scratch how would you fix this? How do other similar projects approach this?

LLuis Chaves Rodriguez

Hey @juanlu why not use the _find_kedro_project function for this? https://github.com/kedro-org/kedro/blob/46259b9f5b89a226d47e2119afb40ad7b4fa5e63/kedro/utils.py#L66

JJuan Luis Cano Rodríguez

maybe! @Luis Chaves Rodriguez have you tried it?

btw, I just read https://github.com/kedro-org/kedro/issues/4440, thanks for opening it 💯

LLuis Chaves Rodriguez

I tried it briefly on Friday but the project I’m working on is not properly set up as python package, so I got some errors at import. I need to clean up some of how the repo was initially set up by the people that came before me, I’ll report back on this next week

Add a reply

Join the Kedro community

Accessing Kedro Configuration Within Marimo Notebook