processing partitioned datasets efficiently

Question

Hello, team!
Does anyone know the best (or maybe most kedroic) way to work with a PartitionedDataset by processing the partitions individually (merging them would consume all memory). I want to aply the same operations to all partitions. Would it be a better idea to use/add namespaces for this (all my files have the format f"sessions_{YYYY-MM-DD}.parquet")? Thank you!

Ankita Katiyar · Answer

You could try using dataset factories for this  https://docs.kedro.org/en/stable/data/kedro_dataset_factories.html

Camilo Piñón · Answer

I'll check it out. Thanks!

Join the Kedro community

processing partitioned datasets efficiently