Join the Kedro community

Updated last week

processing partitioned datasets efficiently

Hello, team!
Does anyone know the best (or maybe most kedroic) way to work with a PartitionedDataset by processing the partitions individually (merging them would consume all memory). I want to aply the same operations to all partitions. Would it be a better idea to use/add namespaces for this (all my files have the format f"sessions_{YYYY-MM-DD}.parquet")? Thank you!

A
C
2 comments

I'll check it out. Thanks!

Add a reply
Sign up and join the conversation on Slack