I currently have an images classification use case. I have 7 classes and save images for each class separately (one class one folder). Not I setup the catalog.yml like this:
type: partitions.PartitionedDataset
filepath: ../data/01_raw/B4CD/{class_name}
type: pillow.ImageDataset
But when I use catalog.load('XXXX')
What should I write for 'XXXX'? {class_name}_data or I have to load each folder/class separately?
hey, when you’re loading the dataset you have to refer to the full name with the class_name
filled in
in catalog.yml
"{class_name}_data": type: partitions.PartitionedDataset filepath: data/01_raw/B4CD/{class_name} dataset: type: pillow.ImageDataset class_mapping: type: pickle.PickleDataset filepath: data/02_intermediate/class_mapping.pklin pipeline.py
def create_pipeline(**kwargs) -> Pipeline: return pipeline( [ node( func=rename_files_in_directory, inputs=["params:basepath"], outputs=["class_mapping"], name="rename_files_in_directory", ), node( func=convert_to_np, inputs=["7_RADC1700-Crack_data", "params:num_classes"], outputs=["images", "labels"], name="convert_to_np", ) ])in nodes.py
def rename_files_in_directory(basepath): class_mapping ={} for folder_name in os.listdir(basepath): folder_path = os.path.join(basepath, folder_name) class_index = "".join(folder_name.split("_", 2)[0]) class_mapping[class_index] = folder_name if os.path.isdir(folder_path): rename_files_in_directory(folder_path) else: file_name, file_extension = os.path.splitext(folder_name) new_file_name = class_index + '_' + file_name + file_extension os.rename(folder_path, os.path.join(basepath, new_file_name)) return class_mapping def convert_to_np(part, num_classes=7): images = [] labels = [] for file, func in part.items(): image = func() images.append(image) labels.append(file[:1]) images = np.array(images, dtype=np.int64) labels = np.array(labels, dtype=np.int64) labels = to_categorical(labels, num_classes=num_classes) return images, labelsBut after I run
kedro run --to-nodes='rename_files_in_directory'
kedro catalog list
, I didn't get all dataset. (I have 7 folders.)