Hi all! I have worked with kedro many times in different operating systems and I have never had issues with catalog path entries. It has always been fine to define catalog entries such like
catalog_entry: type: AnyDataset filepath: data/01_raw/file.extensionwhether on windows or mac. Now I'm having an issue with it for the first time. It turns out that the following catalog entry
problematic_catalog_entry type: MyCustomDataSet mainfolderpath: data/01_raw/file.extensionrises a
winerror 3 the system cannot find the path specified
when loaded from a Kedro Jupyter Notebook butproblematic_catalog_entry_2 type: MyCustomDataSet mainfolderpath: C:\same\path\but\absolute\data\01_raw\file.extensiondoesn't.
AbstractDataset
but I don't have this problem with other custom AbstractDataset
. I will attach my _load
method because the problem might be theredef _load(self): subfolder_names=[ subfolder_name for subfolder_name in os.listdir(self._mainfolderpath) if os.path.isdir(os.path.join(self._mainfolderpath, subfolder_name)) ] wav_paths_dict={} for subfolder_name in subfolder_names: subfolder_path=os.path.join(self._mainfolderpath, subfolder_name) wav_files=[] for root, dirs, files in os.walk(subfolder_path): for file in files: if file.lower().endswith('.wav'): wav_file_path=os.path.join(root, file) wav_file_name=os.path.split(wav_file_path)[-1].replace('.wav','').replace('.WAV','') wav_files.append((wav_file_name,wav_file_path)) wav_paths_dict[subfolder_name]=dict(wav_files) partitioned_dataset_dict={} for subfolder_name, sub_dict in wav_paths_dict.items(): partitioned_dataset=[(wav_file_name,SoundDataset(wav_file_path).load()) for wav_file_name,wav_file_path in sub_dict.items()] partitioned_dataset_dict[subfolder_name]=dict(partitioned_dataset) return partitioned_dataset_dictOn
__init__
I'm initializing self._mainfolderpath
this way: self._mainfolderpath = PurePosixPath(mainfolderpath)
. Thank you very much for yor help againIs it possible to create an minimal example? https://stackoverflow.com/help/minimal-reproducible-example
Is the problem that it handle relative path but fail to process the Windows path?
rises a winerror 3 the system cannot find the path specified
when loaded from a Kedro Jupyter Notebook but
Which lines of code give you this error? You should be able to tell from the stacktrace, or simply print out the path.It seems that the problem is only in Jupyter. The line of code that rises the error is catalog.load('problematic_catalog_entry')
in a kedro jupyter notebook (this is the catalog entry with the relative path). Meanwhile the line catalog.load('problematic_catalog_entry_2')
do not rises an error.
I just ran kedro run --node test_node
from my terminal, where test_node has problematic_catalog_entry
as input and it does not rises an error. This is the same catalog entry that rises an error on jjupyter
How did you create the catalog? or you are using the default one comes with the extension?
For example you can do %load_ext kedro.ipython
, that should load up a global catalog
for you.
The problem here is that, when you are using relative path in Python, it's always relative to your working directory. That mean when you run a notebook, your working directory is in project/notebooks
(I guess that where your notebook are)
We try to detect some keywords to do conversion automatically
conf_keys_with_filepath = ("filename", "filepath", "path")
You can find the logic here: https://github.com/kedro-org/kedro/blob/f1d37513097471fa868e0b1e0d917c1ba7c35894/kedro/framework/context/context.py#L59
Unfortunately there is no easy way for you to just extend that keywords list, so this has to go into your dataset implementation.