Join the Kedro community

Updated 2 months ago

Parallelrunner and dataset error

At a glance

Hey team I am facing an issue that when using ParallelRunner. Basically I am trying to have model-training for many different use-cases (tabular data) parallelized through ParallelRunner, though the problem that I am facing is

DatasetError: Data for MemoryDataset has not been saved yet.

Is that something that you have seen before?

8 comments

RRashida Kanchwala

it looks like a node requires a MemoryDataset as input, and that MemoryDataset hasn't been created yet.

PPaul Mora

Yes haha - that I also understood from there

PPaul Mora

But it is working when using SequentialRunner

PPaul Mora

And ThreadRunner

PPaul Mora

But this error comes when using ParalleRunner, suggesting that its running nodes in the wrong order

HHuong Nguyen

hey @Paul Mora i guess this possibly happened in ParalleRunner if dependencies are not managed/defined in the right way. Could you share how you define your pipeline and perhaps data catalog as well please?

PPaul Mora

@Melvin Kok agreed kindly to help with providing this Huong 🙂

HHuong Nguyen

perfect thank you, @Melvin Kok when you have the time, feel free to DM me 😄

Add a reply