Leveraging Azure ML's Command Mode for Efficient Data Handling with Kedro AzureML

Question

Hey there!
When using kedro azureml with the AzureMLDataset type, it seems to be using the fsspec (as described by the documentation). Is there a way to use the "mode" paramter in AzureML's command, and not have to download each file individually (through fsspec), but rather have them in mode rw_mount?

Alexandre Ouellet · Answer

More specifically, I'm talking about https://github.com/getindata/kedro-azureml/blob/d5c2011c7ed7fdc03235bf2bd6701f1901d1139c/kedro_azureml/generator.py#L247C17-L247C18

It doesn't pass on a "mode", and thus my best guess is that it's relying on https://learn.microsoft.com/en-ca/python/api/overview/azure/ml/?view=azure-ml-py, which as far as I can tell, treats each file as a remote file that will be downloaded

Alexandre Ouellet · Answer

and for our use case : our dataset is about 50Gb of binary data (think audio data), which is used in about 20 different nodes (and no, we can't reduce it further, it is already reduced at that moment).

Alexandre Ouellet · Answer

and download 50Gb of data 20 times does not seem really efficient, so instead we'd like to mount it instead

Ankita Katiyar · Answer

cc  @marrrcin

marrrcin · Answer

PRs are welcome 🙂

Join the Kedro community

Leveraging Azure ML's Command Mode for Efficient Data Handling with Kedro AzureML