Hey there, I'm testing Kedro capabilities to work with DeltaLake. I have a Delta table that is going to be updated every day with new data, and some pipelines that need to recompute models daily. The table is pretty small now but the total data should be increase and might not fit in memory (load all the table and then filter it).
I'm currently using the pandas deltalake dataset.
what are my options in the future? beside pyspark
one more thing, noticed the polars deltalake dataset is not available in your implementation due to specificly specifying the available formats. can you update it so we can actually use polars to lazy scan deltalake?
hello @Sean Yogev! I was about to suggest Polars indeed. there's no official Polars Delta dataset but you can copy paste this:
https://github.com/astrojuanlu/kedro-deltalake-demo/blob/main/src/kedro_deltalake_demo/datasets/polars_delta_dataset.py