Join the Kedro community

Updated 4 months ago

Caching Results to Avoid Expensive Operations

At a glance

The community member is looking for a way to cache the results of an expensive operation by checking if a certain ID already exists in a dataset. They have tried creating two datasets with the same path, using one as input and the other as output. Another community member suggests modifying the dataset code to create a custom dataset with the required methods, or moving the I/O operations directly into the node, bypassing the catalog. The community member agrees with the second suggestion and plans to stick to that approach.

When you have an expensive operation, is there a good way of loading from an existing dataset? I am trying to check if a certain ID already existst and only perform the functionality of a node when it is new. If it is new, I then add those new entries to the saved dataset so that next time, I don't recalculate it. Effectively caching results.

D
J
2 comments

Hi Jannik,
Accessing the catalog inside a node can be quite complicated. You might consider a few workarounds: modifying the dataset code to create a custom dataset with the methods you need, or moving the I/O operations directly into the node, bypassing the catalog.

Thank you Dmitry. So far I have effectively created two datasets with the same path to have them both as input and output. I will stick to this then

Add a reply
Sign up and join the conversation on Slack