tableA: type: pandas.SQLTableDataset table_name: tableA load_args: schema: my.dev save_args: schema: my.dev if_exists: "append" index: False<strike><br />I have a table, TableA, which is the final output of my pipeline. The table already contains primary keys.<br /><br />If the primary key values are not yet in the table, I want to append the new rows. However, for rows where the primary key values already exist, I want to update those rows with the new results.<br /><br />What would be the right save_args to use in this case? I tried 'replace' for if_exists, but this keeps deleting the whole table and only the current results are stored. If I use 'append', despite the primary keys, duplicated results will still be inserted into the table<br /><br /></strike><strike>https://docs.kedro.org/en/0.18.14/kedro_datasets.pandas.SQLTableDataset.html#kedro_datasets.pandas.SQLTableDataset</strike>
What is the cleanest way to run the entire pipeline multiple times?
I have a parameter, observed_date = '2024-10-01'
, defined in parameters.yml
, that I use to run the pipeline. At the end of the pipeline, the output is saved to or replaced in a SQL table.
Now, I want to loop over this pipeline for every 5 day from Jan 2022 till October 2024.
Manually, this would require updating the parameters.yml
file each time I want to change the date and rerun the pipeline (kedro run
).
I don't want to introduce a loop directly into the pipeline, as it’s cleaner when observed_date
is treated as a single date rather than a list of dates.
However, I’d like to find a clean way to loop over different dates, running kedro run
for each date.