Hi kedro community!! I have encountered an issue when working with kedro within a marimo notebook (I think the issue would be just the same in a jupyter notebook). Basically, I initially was working on my notebook by calling it from the command line from the kedro project root folder, something like: marimo edit notebooks/nb.py
where my folder structure is something like:
βββ README.md βββ conf β βββ base β βββ local βββ data ... βββ notebooks β βββ nb.py βββ pyproject.toml βββ requirements.txt βββ src ... βββ tests ...Within
nb.py
I have a cell that runs:from kedro.io import DataCatalog from kedro.config import OmegaConfigLoader from kedro.framework.project import settings from pathlib import Path conf_loader = OmegaConfigLoader( conf_source=Path(__file__).parent /settings.CONF_SOURCE, default_run_env = "base" ) catalog = DataCatalog.from_config(conf_loader["catalog"], credentials=conf_loader["credentials"])
weekly_sales = pl.from_pandas( catalog.load("mytable") )
catalog
all the filepaths are absolute and assume that wherever the catalog is being used from is using the Kedro project root level. the conf_source
argument in the OmegaConfigLoader
instance is an absolute path (e.g. conf/base/sql/somequery.sql
or data/mydataset.csv
so if I run my notebook from the root of my kedro project, all is fine but I were to run: cd notebooks; marimo edit nb.py
then catalog.load
will attempt to load the query or dataset from notebooks/conf/base/sql/somequery.sql
hi @Luis Chaves Rodriguez! I think your message is incomplete? or otherwise could you clarify what the issue is? solved
Yes sorry, I pressed Enter by mistake as I was writing it, it's complete now, let me know if it's unclear @juanlu, the main issue is how the catalog defines the paths to the files that the catalog items are based on I believe
I see that the problem is solved in jupyter notebooks by using magic, but I wonder if there's a magic-free solution?
could this be relevant? https://docs.kedro.org/en/stable/_modules/kedro/ipython.html#magic_reload_kedro
hi this a known issue, and looks like the solution for now was to improve our error messaging - https://github.com/kedro-org/kedro/issues/3248. Maybe you can raise this issue on github, and we can revisit.
but isn't this a solved issue in Jupyter? It should be possible to reproduce in other environments no? Couldn't we get the project root/session/context programmatically just like it happens with the magic?
the story of relative filepaths in the catalog is a bit tricky unfortunately. indeed, using the %load_ext kedro
works, but there's not a good magic-free solution.
@Luis Chaves Rodriguez one thing you can try is to use runtime parameters. in your dataset:
ds: filepath: ${runtime_params:project_root}/data/01_raw/thing.csv
config_loader = OmegaConfigLoader(..., runtime_params={"project_root": Path(...).to_posix()})
Path(...)
to the project root.that makes sense, so every file, based on its location in the project would need to have a different Path(...)
correct? Would the catalog.load
respect that?
In my example, would it be the following?
conf_loader = OmegaConfigLoader( ..., default_run_env = "base", runtime_params = {"project_root": Path(__file__).parent } )
catalog.load will respect it because it will know nothing about it. youβll instantiate the catalog from the config loader. the translation happens at that step.
so itβs a matter of properly prefixing your file paths in the catalog and then instantiating the config loader with the right runtime_params. you can probably wrap that in a function if youβre using it more than once
What about this?
If you had to start from scratch how would you fix this? How do other similar projects approach this?