Dynamically Overwrite Existing Dataset and Parameters in Kedro Catalog

Question

Hey team, how can I dynamically overwrite an existing dataset in the Kedro catalog with a new configuration or data (e.g., changing the file path or dataset content) when running a pipeline from a Jupyter notebook on databricks? Same for dynamically overwriting a parameter. This would be as a one time test run so currently trying to change the notebook on Databricks and then would delete the added code for future runs. Any help on this would be great!

Dmitry Sorokin · Answer

Hi Max, you can refer to this manual for working with the Catalog in code, such as in notebooks:
https://docs.kedro.org/en/stable/data/advanced_data_catalog_usage.html
If I recall correctly, datasets in the Catalog are immutable, meaning you can add new ones but cannot modify existing ones. Is that right, ?

Max Pardoe · Answer

That would make sense as I am getting this error when trying to modify the catalog entry: AttributeError: '_FrozenDatasets' object has no attribute 'create_dataset' Would there be a way around this?

Elena Khaustova · Answer

Hi , as Dmitry mentioned above we do not allow dynamic dataset modifications. But you can replace the entire dataset with the one you need, instead of modifying the existings one. For that you can use  catalog.add(dataset_name, dataset, replace=True)

Elena Khaustova · Answer

The same can be done with the parameters, as at the level of the catalog they treated as  MemoryDataset s

Max Pardoe · Answer

Ok thank you! So this would mean to create a new dataset when declaring it? Or could the 'dataset' parameter be a path to the test data?

Elena Khaustova · Answer

Yes, you need just create a new dataset with the updated filepath/any other parameter and then replace the old one

Join the Kedro community

Dynamically Overwrite Existing Dataset and Parameters in Kedro Catalog