Join the Kedro community

Home
Members
Afiq Johari
A
Afiq Johari
Offline, last seen 3 days ago
Joined November 21, 2024

tableA:
  type: pandas.SQLTableDataset
  table_name: tableA
  load_args:
    schema: my.dev
  save_args:
    schema: my.dev
    if_exists: "append"
    index: False
<strike><br />I have a table, TableA, which is the final output of my pipeline. The table already contains primary keys.<br /><br />If the primary key values are not yet in the table, I want to append the new rows. However, for rows where the primary key values already exist, I want to update those rows with the new results.<br /><br />What would be the right save_args to use in this case? I tried 'replace' for if_exists, but this keeps deleting the whole table and only the current results are stored. If I use 'append', despite the primary keys, duplicated results will still be inserted into the table<br /><br /></strike><strike>https://docs.kedro.org/en/0.18.14/kedro_datasets.pandas.SQLTableDataset.html#kedro_datasets.pandas.SQLTableDataset</strike>

1 comment
A

What is the cleanest way to run the entire pipeline multiple times?

I have a parameter, observed_date = '2024-10-01', defined in parameters.yml, that I use to run the pipeline. At the end of the pipeline, the output is saved to or replaced in a SQL table.

Now, I want to loop over this pipeline for every 5 day from Jan 2022 till October 2024.

Manually, this would require updating the parameters.yml file each time I want to change the date and rerun the pipeline (kedro run).

I don't want to introduce a loop directly into the pipeline, as it’s cleaner when observed_date is treated as a single date rather than a list of dates.

However, I’d like to find a clean way to loop over different dates, running kedro run for each date.

2 comments
P
A