Hello everyone,
I am working on a dynamic pipeline that generates a file for each year in a list, such that the catalog entry would be
data_{year}: type: pandas.ExcelDataset filepath: reports/folder/data_{year}.xlsx save_args: index: FalseThen, I have another pipeline that aggregates all files to process them loading them as a PartitionedDataset, with entry:
partitioned_data: type: partitions.PartitionedDataset path: reports/folder dataset: type: pandas.ExcelDatasetThe main problem with my approach is that even though these two entries refer to the same data, they are in fact different entries, so Kedro runs the second pipeline before the dynamic one.
Hi , thanks for the question.
The main problem with my approach is that even though these two entries refer to the same data, they are in fact different entries, so Kedro runs the second pipeline before the dynamic one.Is it possible to use partition dataset instead of dynamic pipeline in this case?
kedro viz
, it will be a disconnect one so Kedro don't know that the 1st one need to be executed before the other. The other option is to create a fake dummy input/output pair, to ensure the dependencies is resolved correctly.Thanks a lot for the early answer!
I am a bit concerned that loading as a partition instead of looping through the files will cause memory issues, could you elaborate a bit on your suggestion?
My concern is that by using a partition dataset instead of a dynamic pipeline I will encounter memory issues, since the data files are kinda heavy, so I wanted to know your take on this.
https://docs.kedro.org/en/stable/data/partitioned_and_incremental_datasets.html
For partitioned dataset, you could use lazy loading/lazy saving to help with the memory issue.
If you prefer the dynamic pipeline way, it's totally fine, but as mentioned you would need a dummy input/output to control the execution order.
Side note: https://github.com/kedro-org/kedro/discussions/3758
There has been some discussion for adding custom execution order, feel free to comment if this is in your interest