Image Classification Use Case

Question

Hello!!!
I currently have an images classification use case. I have 7 classes and save images for each class separately (one class one folder). Not I setup the catalog.yml like this:
"{class_name}_data":
type: partitions.PartitionedDataset
filepath: ../data/01_raw/B4CD/{class_name}
dataset:
type: pillow.ImageDataset
But when I use catalog.load('XXXX')
What should I write for 'XXXX'? {class_name}_data or I have to load each folder/class separately?

Ankita Katiyar · Answer

hey, when you’re loading the dataset you have to refer to the full name with the  class_name  filled in

Shu-Chun Wu · Answer

In this case, how should I create in pipeline?

Shu-Chun Wu · Answer

in catalog.yml "{class_name}_data":
  type: partitions.PartitionedDataset
  filepath: data/01_raw/B4CD/{class_name}
  dataset:
    type: pillow.ImageDataset

class_mapping:
  type: pickle.PickleDataset
  filepath: data/02_intermediate/class_mapping.pkl in pipeline.py def create_pipeline(**kwargs) - >  Pipeline:
    return pipeline(
        [
            node(
                func=rename_files_in_directory,
                inputs=["params:basepath"],
                outputs=["class_mapping"],
                name="rename_files_in_directory",
            ),
            node(
                func=convert_to_np,
                inputs=["7_RADC1700-Crack_data", "params:num_classes"],
                outputs=["images", "labels"],
                name="convert_to_np",
            )
        ]) in nodes.py def rename_files_in_directory(basepath):
    class_mapping ={}
    for folder_name in os.listdir(basepath):
        folder_path = os.path.join(basepath, folder_name)
        class_index = "".join(folder_name.split("_", 2)[0])
        class_mapping[class_index] = folder_name
        if os.path.isdir(folder_path):
            rename_files_in_directory(folder_path)
        else:
            file_name, file_extension = os.path.splitext(folder_name)
            new_file_name = class_index + '_' + file_name + file_extension
            os.rename(folder_path, os.path.join(basepath, new_file_name))
        return class_mapping

def convert_to_np(part, num_classes=7):
    images = []
    labels = []
    for file, func in part.items():
        image = func()
        images.append(image)
        labels.append(file[:1])
    images = np.array(images, dtype=np.int64)
    labels = np.array(labels, dtype=np.int64)
    labels = to_categorical(labels, num_classes=num_classes)
    return images, labels But after I run  kedro run --to-nodes='rename_files_in_directory' I got this erro = >  ValueError: Pipeline does not contain nodes named ["'rename_files_in_directory'"]. And after I run  kedro catalog list  , I didn't get all dataset. (I have 7 folders.)

Join the Kedro community

Image Classification Use Case