Join the Kedro community

Updated 4 weeks ago

Parallelrunner and dataset error

Hey team I am facing an issue that when using ParallelRunner. Basically I am trying to have model-training for many different use-cases (tabular data) parallelized through ParallelRunner, though the problem that I am facing is

DatasetError: Data for MemoryDataset has not been saved yet.
Is that something that you have seen before?

R
P
H
8 comments

it looks like a node requires a MemoryDataset as input, and that MemoryDataset hasn't been created yet.

Yes haha - that I also understood from there

But it is working when using SequentialRunner

And ThreadRunner

But this error comes when using ParalleRunner, suggesting that its running nodes in the wrong order

hey @Paul Mora i guess this possibly happened in ParalleRunner if dependencies are not managed/defined in the right way. Could you share how you define your pipeline and perhaps data catalog as well please?

@Melvin Kok agreed kindly to help with providing this Huong πŸ™‚

perfect thank you, @Melvin Kok when you have the time, feel free to DM me πŸ˜„

Add a reply
Sign up and join the conversation on Slack