Join the Kedro community

Updated 2 months ago

Caching Results to Avoid Expensive Operations

When you have an expensive operation, is there a good way of loading from an existing dataset? I am trying to check if a certain ID already existst and only perform the functionality of a node when it is new. If it is new, I then add those new entries to the saved dataset so that next time, I don't recalculate it. Effectively caching results.

D
J
2 comments

Hi Jannik,
Accessing the catalog inside a node can be quite complicated. You might consider a few workarounds: modifying the dataset code to create a custom dataset with the methods you need, or moving the I/O operations directly into the node, bypassing the catalog.

Thank you Dmitry. So far I have effectively created two datasets with the same path to have them both as input and output. I will stick to this then

Add a reply
Sign up and join the conversation on Slack