Hey Folks I am looking for a way to mount AWS EFS volume to my kedro pipeline which will be executed by kubeflow . I am using the kubeflow plugin.
The config has below 2 options for Volumes , I am not sure which one is for what purpose
volume: # Storage class - use null (or no value) to use the default storage # class deployed on the Kubernetes cluster storageclass: # default # The size of the volume that is created. Applicable for some storage # classes size: 1Gi # Access mode of the volume used to exchange data. ReadWriteMany is # preferred, but it is not supported on some environements (like GKE) # Default value: ReadWriteOnce #access_modes: [ReadWriteMany] # Flag indicating if the data-volume-init step (copying raw data to the # fresh volume) should be skipped skip_init: False # Allows to specify user executing pipelines within containers # Default: root user (to avoid issues with volumes in GKE) owner: 0 # Flak indicating if volume for inter-node data exchange should be # kept after the pipeline is deleted keep: False2.
# Optional section to allow mounting additional volumes (such as EmptyDir) # to specific nodes extra_volumes: tensorflow_step: - mount_path: /dev/shm volume: name: shared_memory empty_dir: cls: V1EmptyDirVolumeSource params: medium: Memory
/home/kedro/data
/dev/shm
for distributed training in PyTorch (Kubernetes has problems with that). can i conclude that the first volume needs to be configured in case i want to use the EFS system.
Also, the storage class is something that i need to check with the k8 cluster manager for the EFS I want to mount.
This is how our EFS system is used as a pvc volume in our kubernetes cluster
apiVersion: v1 kind: PersistentVolume metadata: name: data-pv-kubeflow spec: accessModes: - ReadWriteMany capacity: storage: 100Gi csi: driver: efs.csi.aws.com volumeHandle: "fs-02d6475f7552a3c13:/data" persistentVolumeReclaimPolicy: Retain storageClassName: efs-sc volumeMode: Filesystem
storageClassName: efs-sc
, this is what we need to use right as the storage class ? I defined the above storage class as mentioned below
# Optional volume specification volume: storageclass: efs-sc access_modes: [ReadWriteMany] # Flag indicating if the data-volume-init step (copying raw data to the # fresh volume) should be skipped skip_init: False # Allows to specify user executing pipelines within containers # Default: root user (to avoid issues with volumes in GKE) owner: 0 # Flak indicating if volume for inter-node data exchange should be # kept after the pipeline is deleted keep: False
INFO Loading data from companies data_catalog.py:539 (CSVDataset)... INFO Running node: node.py:364 preprocess_companies_node: preprocess_companies([companies]) -> [preprocessed_companies] DEBUG Inside Preprocess Companies nodes.py:32 DEBUG Checking EFS Mount now nodes.py:33 DEBUG ['01_raw'] nodes.py:34