Hey team I am facing an issue that when using ParallelRunner
. Basically I am trying to have model-training for many different use-cases (tabular data) parallelized through ParallelRunner
, though the problem that I am facing is
DatasetError: Data for MemoryDataset has not been saved yet.Is that something that you have seen before?
it looks like a node requires a MemoryDataset as input, and that MemoryDataset hasn't been created yet.
But this error comes when using ParalleRunner, suggesting that its running nodes in the wrong order
hey @Paul Mora i guess this possibly happened in ParalleRunner if dependencies are not managed/defined in the right way. Could you share how you define your pipeline and perhaps data catalog as well please?