Kedro catalog path entries not working on new operating...

NNicolas Betancourt Cardona

Hi all! I have worked with kedro many times in different operating systems and I have never had issues with catalog path entries. It has always been fine to define catalog entries such like

catalog_entry:
  type: AnyDataset
  filepath: data/01_raw/file.extension

whether on windows or mac. Now I'm having an issue with it for the first time. It turns out that the following catalog entry

problematic_catalog_entry
  type: MyCustomDataSet
  mainfolderpath: data/01_raw/file.extension

rises a winerror 3 the system cannot find the path specified when loaded from a Kedro Jupyter Notebook but

problematic_catalog_entry_2
  type: MyCustomDataSet
  mainfolderpath: C:\same\path\but\absolute\data\01_raw\file.extension

doesn't.

This is absolutely my fault because the data set type I'm using is a custom AbstractDataset but I don't have this problem with other custom AbstractDataset . I will attach my _load method because the problem might be there

def _load(self):
        subfolder_names=[ subfolder_name 
                         for subfolder_name in os.listdir(self._mainfolderpath) 
                         if os.path.isdir(os.path.join(self._mainfolderpath, subfolder_name)) 
                        ]
        
        
        wav_paths_dict={}
        for subfolder_name in subfolder_names:
            subfolder_path=os.path.join(self._mainfolderpath, subfolder_name)
            wav_files=[]
            for root, dirs, files in os.walk(subfolder_path):
                for file in files:
                    if file.lower().endswith('.wav'):
                        wav_file_path=os.path.join(root, file)
                        wav_file_name=os.path.split(wav_file_path)[-1].replace('.wav','').replace('.WAV','')
                        wav_files.append((wav_file_name,wav_file_path))
                wav_paths_dict[subfolder_name]=dict(wav_files)

        
        partitioned_dataset_dict={}
        for subfolder_name, sub_dict in wav_paths_dict.items():
            partitioned_dataset=[(wav_file_name,SoundDataset(wav_file_path).load()) for wav_file_name,wav_file_path in sub_dict.items()]
            partitioned_dataset_dict[subfolder_name]=dict(partitioned_dataset)
        
        return partitioned_dataset_dict

On __init__ I'm initializing self._mainfolderpath this way: self._mainfolderpath = PurePosixPath(mainfolderpath) . Thank you very much for yor help again

12 comments

NNok Lam Chan

Is it possible to create an minimal example? https://stackoverflow.com/help/minimal-reproducible-example

NNok Lam Chan

Is the problem that it handle relative path but fail to process the Windows path?

rises a winerror 3 the system cannot find the path specified when loaded from a Kedro Jupyter Notebook but

Which lines of code give you this error? You should be able to tell from the stacktrace, or simply print out the path.

NNicolas Betancourt Cardona

It seems that the problem is only in Jupyter. The line of code that rises the error is catalog.load('problematic_catalog_entry') in a kedro jupyter notebook (this is the catalog entry with the relative path). Meanwhile the line catalog.load('problematic_catalog_entry_2') do not rises an error.

I just ran kedro run --node test_node from my terminal, where test_node has problematic_catalog_entry as input and it does not rises an error. This is the same catalog entry that rises an error on jjupyter

NNok Lam Chan

How did you create the catalog? or you are using the default one comes with the extension?

For example you can do %load_ext kedro.ipython, that should load up a global catalog for you.

NNicolas Betancourt Cardona

I'm using the default one that comes with the jupyter extension

NNok Lam Chan

Thanks, that rings a bell. https://github.com/kedro-org/kedro/issues/2942

NNicolas Betancourt Cardona

Runing os.chdir("/path/to/kedro/project") fixed the problem

NNok Lam Chan

Kedro do the best effort to do these path conversation.

NNok Lam Chan

The problem here is that, when you are using relative path in Python, it's always relative to your working directory. That mean when you run a notebook, your working directory is in project/notebooks(I guess that where your notebook are)

NNok Lam Chan

We try to detect some keywords to do conversion automatically

conf_keys_with_filepath = ("filename", "filepath", "path")

But in your case the conversion didn't happen. So you will likely have to handle that yourself.

NNok Lam Chan

You can find the logic here: https://github.com/kedro-org/kedro/blob/f1d37513097471fa868e0b1e0d917c1ba7c35894/kedro/framework/context/context.py#L59

Unfortunately there is no easy way for you to just extend that keywords list, so this has to go into your dataset implementation.

NNicolas Betancourt Cardona

This helped me alot. Thank you very much , you are always so nice :)

Add a reply

Join on Slack

Join the Kedro community

Kedro catalog path entries not working on new operating system