Join the Kedro community

Updated 2 weeks ago

Leveraging Azure ML's Command Mode for Efficient Data Handling with Kedro AzureML

Hey there!
When using kedro azureml with the AzureMLDataset type, it seems to be using the fsspec (as described by the documentation). Is there a way to use the "mode" paramter in AzureML's command, and not have to download each file individually (through fsspec), but rather have them in mode rw_mount?

A
A
m
5 comments

More specifically, I'm talking about https://github.com/getindata/kedro-azureml/blob/d5c2011c7ed7fdc03235bf2bd6701f1901d1139c/kedro_azureml/generator.py#L247C17-L247C18

It doesn't pass on a "mode", and thus my best guess is that it's relying on https://learn.microsoft.com/en-ca/python/api/overview/azure/ml/?view=azure-ml-py, which as far as I can tell, treats each file as a remote file that will be downloaded

and for our use case : our dataset is about 50Gb of binary data (think audio data), which is used in about 20 different nodes (and no, we can't reduce it further, it is already reduced at that moment).

and download 50Gb of data 20 times does not seem really efficient, so instead we'd like to mount it instead

PRs are welcome πŸ™‚

Add a reply
Sign up and join the conversation on Slack