Question, if we have a dataset in the catalog that updates incrementally with an upsert
/append
, and then is an input to another node, is there a risk that the full dataset will get loaded from the catalog and input to the downstream node rather than the increment of data?
Hi team, are there any best practices for optimizing spark code within Kedro pipelines? I have a large pipeline that executes at the last node due to lazy eval. I would like to look at execution plans, etc.
Any suggestions? I suppose this would apply to Polars/Ibis/other similar frameworks.