Hey there! Quick question about kedro-azureml. We are using AzureML, and we'd like to use AzureMLAssetDataset with dataset factories.
After a lot of headach and debugging, it seems impossible to use both, as the way credentials are passed to the AzureMLAssetDataset is done through a hook (after_catalog_created), but the issue is that if you use a dataset_patterns (as in, declare your dataset as "{name}.csv" or something similar), the hook is called, but the patterned dataset is not instanciated yet.
After all that, a before_node_run is called, and then there is a AzureMLAssetDataset._load() called, but the AzureMLAssetDataset.azure_config setter hasn't been called yet (as it is called only in the after_catalog_created hook). At first glance, it seems like a kedro-azureml issue, as AzureMLAssetDataset._load() can be called without the setter being called when used as a dataset factory. But also, it might be a kedro issue, as I think there should be an obvious way to setup credentials in that specific scenario, and I don't quite see it from the docs on hook either
Trying to make it slightly clearer :
AzureMLAssetDataset : Is instanciated, then after_catalog_created is called, and setter for azureml credentials is set, and eventually _load() is called
When used as a dataset factory, after_catalog_created is called, then it is instanciated, then _load() is called, and I can't find a good hook in-between to set up credentials
I did not see anything like that reported in kedro-azureml's github, is that something you are aware/need to be reported as an issue?
hi , sorry you had a bumpy experience. looks like this might be an issue with dataset factories in general. maybe an alternative to after_catalog_created
for passing credentials would work? cc
where have you read the recommendation for using the hook for the credentials?
I can see a use case mentioned for regular datasets - https://docs.kedro.org/en/stable/hooks/common_use_cases.html#use-hooks-to-load-external-credentials
But for dataset factory as mentioned the execution flow is different (dataset resolution happens later). I think there was some work around this by . I am not sure if the resolution happens like regular datasets.
Hi , credentials are resolved when the catalog is instantiated, regardless of whether you use dataset patterns. So, when you use after_catalog_created,
credentials are already resolved by this time, and you cannot pass them to the catalog.
The right way to do that is to follow an example that shared and use after_context_created
hook: https://docs.kedro.org/en/stable/hooks/common_use_cases.html#use-hooks-to-load-external-credentials
right, in that case, I'll open a kedro-azureml issue, as the AzureMLAssetDataset class expects to receive credentials when calling after_catalog_created (through a setter for all datasets of types AzureMLAssetDataset), and expects after_catalog_created to be called before calling "_load()".
In the case of dataset factory, the AzureMLAssetDataset is created then _load() is called right after, never having its setter called. It seems like AzureMLAssetDataset needs a bit of rework in how they gather their credentials
https://github.com/getindata/kedro-azureml/pull/161 has been created as a fix for kedro-azureml
Essentially, I explicitly set up a "credentials" parameter in the catalog for AzureMLAssetDataset, and implicitly inject the azureml's credential as an "azureml" key in the context in after_context_created. Thus, the credentials are injected at __init__ time, and not at "after_catalog_created", fixing my issue with dataset factories
PR is coming soon. In the meantime, I've found a way to also make it work without this PR, as I don't know when it will be integrated
for those wondering, it involves a custom AzureMLAssetDataset to simply add the injected credentials from context, creating a "after_context_created" hook to inject credentials into the context, and to add a "before_pipeline_run" hook to handle not only a dataset, but also the datset factory/pattern dataset.
Technically, after that, there shouldn't be a need for the "after_catalog_created" hook
I've commented my workaround here : https://github.com/getindata/kedro-azureml/issues/160
it should be fairly straightforward (custom dataset + hook), but let me know in the issue if you have any problems, I'll try to find some time and have a look at it. The main issues were credentials passing, and properly detecting when running as remote(within AzureML's compute) when having dataset pattern (dataset factories).