Hi kedro community!! I have encountered an issue when working with kedro within a marimo notebook (I think the issue would be just the same in a jupyter notebook). Basically, I initially was working on my notebook by calling it from the command line from the kedro project root folder, something like: marimo edit notebooks/nb.py
where my folder structure is something like:
βββ README.md βββ conf β βββ base β βββ local βββ data ... βββ notebooks β βββ nb.py βββ pyproject.toml βββ requirements.txt βββ src ... βββ tests ...Within
nb.py
I have a cell that runs:from kedro.io import DataCatalog from kedro.config import OmegaConfigLoader from kedro.framework.project import settings from pathlib import Path conf_loader = OmegaConfigLoader( conf_source=Path(__file__).parent /settings.CONF_SOURCE, default_run_env = "base" ) catalog = DataCatalog.from_config(conf_loader["catalog"], credentials=conf_loader["credentials"])
weekly_sales = pl.from_pandas( catalog.load("mytable") )
catalog
all the filepaths are absolute and assume that wherever the catalog is being used from is using the Kedro project root level. the conf_source
argument in the OmegaConfigLoader
instance is an absolute path (e.g. conf/base/sql/somequery.sql
or data/mydataset.csv
so if I run my notebook from the root of my kedro project, all is fine but I were to run: cd notebooks; marimo edit nb.py
then catalog.load
will attempt to load the query or dataset from notebooks/conf/base/sql/somequery.sql
hi @Luis Chaves Rodriguez! I think your message is incomplete? or otherwise could you clarify what the issue is? solved
Yes sorry, I pressed Enter by mistake as I was writing it, it's complete now, let me know if it's unclear @juanlu, the main issue is how the catalog defines the paths to the files that the catalog items are based on I believe
I see that the problem is solved in jupyter notebooks by using magic, but I wonder if there's a magic-free solution?
could this be relevant? https://docs.kedro.org/en/stable/_modules/kedro/ipython.html#magic_reload_kedro
hi this a known issue, and looks like the solution for now was to improve our error messaging - https://github.com/kedro-org/kedro/issues/3248. Maybe you can raise this issue on github, and we can revisit.
but isn't this a solved issue in Jupyter? It should be possible to reproduce in other environments no? Couldn't we get the project root/session/context programmatically just like it happens with the magic?
the story of relative filepaths in the catalog is a bit tricky unfortunately. indeed, using the %load_ext kedro
works, but there's not a good magic-free solution.
@Luis Chaves Rodriguez one thing you can try is to use runtime parameters. in your dataset:
ds: filepath: ${runtime_params:project_root}/data/01_raw/thing.csv
config_loader = OmegaConfigLoader(..., runtime_params={"project_root": Path(...).to_posix()})
Path(...)
to the project root.that makes sense, so every file, based on its location in the project would need to have a different Path(...)
correct? Would the catalog.load
respect that?
In my example, would it be the following?
conf_loader = OmegaConfigLoader( ..., default_run_env = "base", runtime_params = {"project_root": Path(__file__).parent } )
catalog.load will respect it because it will know nothing about it. youβll instantiate the catalog from the config loader. the translation happens at that step.
so itβs a matter of properly prefixing your file paths in the catalog and then instantiating the config loader with the right runtime_params. you can probably wrap that in a function if youβre using it more than once
What about this?
If you had to start from scratch how would you fix this? How do other similar projects approach this?
Hey @juanlu why not use the _find_kedro_project
function for this? https://github.com/kedro-org/kedro/blob/46259b9f5b89a226d47e2119afb40ad7b4fa5e63/kedro/utils.py#L66
maybe! @Luis Chaves Rodriguez have you tried it?
btw, I just read https://github.com/kedro-org/kedro/issues/4440, thanks for opening it π―
I tried it briefly on Friday but the project Iβm working on is not properly set up as python package, so I got some errors at import. I need to clean up some of how the repo was initially set up by the people that came before me, Iβll report back on this next week