Automatically labeling and transforming audio files in ...

Hi everyone,

I'm working in a kedro project where I want to automatically label thousands of audio files, apply transformations to them and then store them in a folder of folders, each subfolder corresponding to one label. I want that folder of folders to be a catalog entry on my yml file

I followed this Kedro tutorial and created my own custom dataset for saving/loading .wav files in kedro catalog. I also am able to create PartitionedDataset catalog entries in catalog.yml such as

audio_folder:
  type: partitions.PartitionedDataset
  dataset: my_kedro_project.datasets.audio_dataset.SoundDataset
  path: data/output/audios/
  filename_suffix: ".WAV"

The next level of abstraction I would require is to be able to create a catalog entry corresponding to a folder containig folders such as the audio_folder above. Here is my try to do so but I'm having an issue with the _save method

class AudioFolderDataset(PartitionedDataset):
    

    def __init__(self, main_folder_path: str):
        """Creates a new instance of SoundDataset to load / save audio data for given filepath.

        Args:
            filepath: The location of the audio file to load / save data.
        """
        protocol, mainfolderpath = get_protocol_and_path(main_folder_path)
        self._protocol = protocol
        self._mainfolderpath = PurePosixPath(mainfolderpath)
        self._fs = fsspec.filesystem(self._protocol)

    def _load(self,subfolders_dictionary):
        # loading code 
        .
    def _save(self, subfolders_dictionary):
        os.path.normpath(self._mainfolderpath)
        for subfolder_name in subfolders_dictionary.keys():
            subfolder_path=os.path.join(self._mainfolderpath, subfolder_name) 
            
            partitioned_dataset = PartitionedDataset(
            path=subfolder_path,
            dataset=SoundDataset,
            filename_suffix=".WAV",
            )
            
            partitioned_dataset.save(subfolders_dictionary[subfolder_name])
    
    
    partitioned_dataset.save(subfolders_dictionary[subfolder_name])
    
    def _describe(self):
        # describe code

The problem is I'm working on windows but it seems that PartitionedDataset assumes that my system separator is / instead of \ . When I print the path in _save method in SoundDataset class I get folder\\subfolder/file.WAV which off course os leading to an error.
Is there a way in which I can change this default behaviour?

Join the Kedro community

Automatically labeling and transforming audio files in Kedro project