Hey team, how can I dynamically overwrite an existing dataset in the Kedro catalog with a new configuration or data (e.g., changing the file path or dataset content) when running a pipeline from a Jupyter notebook on databricks? Same for dynamically overwriting a parameter. This would be as a one time test run so currently trying to change the notebook on Databricks and then would delete the added code for future runs. Any help on this would be great!
Hi Max, you can refer to this manual for working with the Catalog in code, such as in notebooks:
https://docs.kedro.org/en/stable/data/advanced_data_catalog_usage.html
If I recall correctly, datasets in the Catalog are immutable, meaning you can add new ones but cannot modify existing ones. Is that right, ?
That would make sense as I am getting this error when trying to modify the catalog entry: AttributeError: '_FrozenDatasets' object has no attribute 'create_dataset'
Would there be a way around this?
Hi , as Dmitry mentioned above we do not allow dynamic dataset modifications. But you can replace the entire dataset with the one you need, instead of modifying the existings one. For that you can use catalog.add(dataset_name, dataset, replace=True)
The same can be done with the parameters, as at the level of the catalog they treated as MemoryDataset
s
Ok thank you! So this would mean to create a new dataset when declaring it? Or could the 'dataset' parameter be a path to the test data?
Yes, you need just create a new dataset with the updated filepath/any other parameter and then replace the old one