Join the Kedro community

A
M
M
M
D

Kedro-azureml: Issues with using AzureMLAssetDataset with dataset factories and dataset patterns

Hey there! Quick question about kedro-azureml. We are using AzureML, and we'd like to use AzureMLAssetDataset with dataset factories.
After a lot of headach and debugging, it seems impossible to use both, as the way credentials are passed to the AzureMLAssetDataset is done through a hook (after_catalog_created), but the issue is that if you use a dataset_patterns (as in, declare your dataset as "{name}.csv" or something similar), the hook is called, but the patterned dataset is not instanciated yet.
After all that, a before_node_run is called, and then there is a AzureMLAssetDataset._load() called, but the AzureMLAssetDataset.azure_config setter hasn't been called yet (as it is called only in the after_catalog_created hook). At first glance, it seems like a kedro-azureml issue, as AzureMLAssetDataset._load() can be called without the setter being called when used as a dataset factory. But also, it might be a kedro issue, as I think there should be an obvious way to setup credentials in that specific scenario, and I don't quite see it from the docs on hook either

1
A
J
R
7 comments

Trying to make it slightly clearer :
AzureMLAssetDataset : Is instanciated, then after_catalog_created is called, and setter for azureml credentials is set, and eventually _load() is called
When used as a dataset factory, after_catalog_created is called, then it is instanciated, then _load() is called, and I can't find a good hook in-between to set up credentials

I did not see anything like that reported in kedro-azureml's github, is that something you are aware/need to be reported as an issue?

hi , sorry you had a bumpy experience. looks like this might be an issue with dataset factories in general. maybe an alternative to after_catalog_created for passing credentials would work? cc

where have you read the recommendation for using the hook for the credentials?

I can see a use case mentioned for regular datasets - https://docs.kedro.org/en/stable/hooks/common_use_cases.html#use-hooks-to-load-external-credentials

But for dataset factory as mentioned the execution flow is different (dataset resolution happens later). I think there was some work around this by . I am not sure if the resolution happens like regular datasets.

Hi , credentials are resolved when the catalog is instantiated, regardless of whether you use dataset patterns. So, when you use after_catalog_created, credentials are already resolved by this time, and you cannot pass them to the catalog.

The right way to do that is to follow an example that shared and use after_context_created hook: https://docs.kedro.org/en/stable/hooks/common_use_cases.html#use-hooks-to-load-external-credentials

Add a reply
Sign up and join the conversation on Slack
Join