Join the Kedro community

M
M
M
D
M

Kedro catalog path entries not working on new operating system

Hi all! I have worked with kedro many times in different operating systems and I have never had issues with catalog path entries. It has always been fine to define catalog entries such like

catalog_entry:
  type: AnyDataset
  filepath: data/01_raw/file.extension
whether on windows or mac. Now I'm having an issue with it for the first time. It turns out that the following catalog entry
problematic_catalog_entry
  type: MyCustomDataSet
  mainfolderpath: data/01_raw/file.extension
rises a winerror 3 the system cannot find the path specified when loaded from a Kedro Jupyter Notebook but
problematic_catalog_entry_2
  type: MyCustomDataSet
  mainfolderpath: C:\same\path\but\absolute\data\01_raw\file.extension
doesn't.

This is absolutely my fault because the data set type I'm using is a custom AbstractDataset but I don't have this problem with other custom AbstractDataset . I will attach my _load method because the problem might be there

def _load(self):
        subfolder_names=[ subfolder_name 
                         for subfolder_name in os.listdir(self._mainfolderpath) 
                         if os.path.isdir(os.path.join(self._mainfolderpath, subfolder_name)) 
                        ]
        
        
        wav_paths_dict={}
        for subfolder_name in subfolder_names:
            subfolder_path=os.path.join(self._mainfolderpath, subfolder_name)
            wav_files=[]
            for root, dirs, files in os.walk(subfolder_path):
                for file in files:
                    if file.lower().endswith('.wav'):
                        wav_file_path=os.path.join(root, file)
                        wav_file_name=os.path.split(wav_file_path)[-1].replace('.wav','').replace('.WAV','')
                        wav_files.append((wav_file_name,wav_file_path))
                wav_paths_dict[subfolder_name]=dict(wav_files)

        
        partitioned_dataset_dict={}
        for subfolder_name, sub_dict in wav_paths_dict.items():
            partitioned_dataset=[(wav_file_name,SoundDataset(wav_file_path).load()) for wav_file_name,wav_file_path in sub_dict.items()]
            partitioned_dataset_dict[subfolder_name]=dict(partitioned_dataset)
        
        return partitioned_dataset_dict
On __init__ I'm initializing self._mainfolderpath this way: self._mainfolderpath = PurePosixPath(mainfolderpath) . Thank you very much for yor help again

N
N
12 comments

Is the problem that it handle relative path but fail to process the Windows path?

rises a winerror 3 the system cannot find the path specified when loaded from a Kedro Jupyter Notebook but
Which lines of code give you this error? You should be able to tell from the stacktrace, or simply print out the path.

It seems that the problem is only in Jupyter. The line of code that rises the error is catalog.load('problematic_catalog_entry') in a kedro jupyter notebook (this is the catalog entry with the relative path). Meanwhile the line catalog.load('problematic_catalog_entry_2') do not rises an error.

I just ran kedro run --node test_node from my terminal, where test_node has problematic_catalog_entry as input and it does not rises an error. This is the same catalog entry that rises an error on jjupyter

How did you create the catalog? or you are using the default one comes with the extension?

For example you can do %load_ext kedro.ipython, that should load up a global catalog for you.

I'm using the default one that comes with the jupyter extension

Runing os.chdir("/path/to/kedro/project") fixed the problem

Kedro do the best effort to do these path conversation.

The problem here is that, when you are using relative path in Python, it's always relative to your working directory. That mean when you run a notebook, your working directory is in project/notebooks(I guess that where your notebook are)

We try to detect some keywords to do conversion automatically

conf_keys_with_filepath = ("filename", "filepath", "path")

But in your case the conversion didn't happen. So you will likely have to handle that yourself.

You can find the logic here: https://github.com/kedro-org/kedro/blob/f1d37513097471fa868e0b1e0d917c1ba7c35894/kedro/framework/context/context.py#L59

Unfortunately there is no easy way for you to just extend that keywords list, so this has to go into your dataset implementation.

This helped me alot. Thank you very much , you are always so nice :)

Add a reply
Sign up and join the conversation on Slack
Join