Caching Results to Avoid Expensive Operations

At a glance

The community member is looking for a way to cache the results of an expensive operation by checking if a certain ID already exists in a dataset. They have tried creating two datasets with the same path, using one as input and the other as output. Another community member suggests modifying the dataset code to create a custom dataset with the required methods, or moving the I/O operations directly into the node, bypassing the catalog. The community member agrees with the second suggestion and plans to stick to that approach.

JJannik Wiedenhaupt

When you have an expensive operation, is there a good way of loading from an existing dataset? I am trying to check if a certain ID already existst and only perform the functionality of a node when it is new. If it is new, I then add those new entries to the saved dataset so that next time, I don't recalculate it. Effectively caching results.

2 comments

DDmitry Sorokin

Hi Jannik,
Accessing the catalog inside a node can be quite complicated. You might consider a few workarounds: modifying the dataset code to create a custom dataset with the methods you need, or moving the I/O operations directly into the node, bypassing the catalog.

JJannik Wiedenhaupt

Thank you Dmitry. So far I have effectively created two datasets with the same path to have them both as input and output. I will stick to this then

Add a reply

Join the Kedro community

Caching Results to Avoid Expensive Operations