Kedro Partition Dataset Lacks Caching Support

At a glance

The community member is asking if there is a reason why PartitionDataset does not have caching support. They explain that when running an expensive computation, if an error occurs, it would be helpful to have a way to resume the computation from where it left off, instead of starting from the beginning. They mention this is particularly relevant when using dict of Callable for Kedro to invoke. The community member wonders if there is a specific reason why this feature is not yet available.

In the comments, other community members express interest in this feature and suggest creating a feature request, as it would require some design considerations. One community member thinks it might be a simple addition, like adding a cache: True parameter to check if the file exists before calling the user function, but acknowledges they may be missing other cases. They mention they will open a request ticket for this.

There is no explicitly marked answer in the provided information.

Useful resources

FFazil Topal

Hi all,

Is there a reason we don't have caching support in PartitionDataset ? Image running an expensive computation but in the middle an error occurs and re-run is needed. I would assume having a logic to resume where we left off would be quite handy instead of starting from all over again. Specially in the case of return dict of Callable for kedro to invoke. I can certainly override this but i was wondering if there was a special reason why we don't have this yet

5 comments

LLaurens Vijnck

Interested in this one!

ddatajoely

It would be great to create a feature request for this, as it would need design

FFazil Topal

hmm really? I thought it was somehow simple like adding cache: True parameter to check if file exists before we call the user function but i'm most likely missing lots of other cases 😄 In any case i'll open a request ticket for this

ddatajoely

I think I say design because we have a few features associated with it, this might make sense as a wider piece.

FFazil Topal

https://github.com/kedro-org/kedro-plugins/issues/974 Opened it here.

Add a reply

Join the Kedro community

Kedro Partition Dataset Lacks Caching Support