Hi,
When using kedro vs code extension,
I'm getting this unexpected error, despite the credentials being defined. and the dataset type is valid because the latest documentation asks for pandas.SQLQueryDataset and not pandas.SQLQueryDataSet"Dataset 'output' has an invalid type 'pandas.SQLQueryDataset'. "Unable to find credentials 'my_cred': check your data catalog and credentials configuration. See
https://kedro.readthedocs.io/en/stable/kedro.io.DataCatalog.html
for an example."Kedro LSP"
I've also defined "kedro.environment": "local"
in settings.json
Note that, the error only appears when I enable the extension, and no error when I disable the extension, and the pipelines work just fine.
Hi,
let's say I have this simple SQLQueryDataset
table_output: type: pandas.SQLQueryDataset sql: > select * from dev.output
parameters.yml
mode: 'ABC'
table_output
to filter itself based on parameter mode
table_output
will be filtered based on the 'ABC'
paramstable_output: type: pandas.SQLQueryDataset sql: > select * from dev.output where mode = ? load_args: params: - 'ABC'
parameters.yml
as part of the table_output
definition?table_output: type: pandas.SQLQueryDataset sql: > select * from dev.output where mode = ? load_args: params: - ${parameters.mode} # to reference parameters in parameters.yml?
Hi,
Need help with the below data catalog
sql_table: type: pandas.SQLTableDataset table_name: RD load_args: schema: RD.dev sql_table: type: pandas.SQLTableDataset table_name: RD load_args: schema: RD.prdBasically I want to be able to parameterize the schema in parameters.yml
schema: "dev" # prd
sql_table: type: pandas.SQLTableDataset table_name: RD load_args: schema: RD.${params.schema}And get this error, but haven't been able to debug it unfortunately. Appreciate any advice on this. Thanks!
InterpolationKeyError: Interpolation key 'params.schema' not found full_key: sql_table.load_args.schema object_type=dict
tableA: type: pandas.SQLTableDataset table_name: tableA load_args: schema: my.dev save_args: schema: my.dev if_exists: "append" index: False<strike><br />I have a table, TableA, which is the final output of my pipeline. The table already contains primary keys.<br /><br />If the primary key values are not yet in the table, I want to append the new rows. However, for rows where the primary key values already exist, I want to update those rows with the new results.<br /><br />What would be the right save_args to use in this case? I tried 'replace' for if_exists, but this keeps deleting the whole table and only the current results are stored. If I use 'append', despite the primary keys, duplicated results will still be inserted into the table<br /><br /></strike><strike>https://docs.kedro.org/en/0.18.14/kedro_datasets.pandas.SQLTableDataset.html#kedro_datasets.pandas.SQLTableDataset</strike>
What is the cleanest way to run the entire pipeline multiple times?
I have a parameter, observed_date = '2024-10-01'
, defined in parameters.yml
, that I use to run the pipeline. At the end of the pipeline, the output is saved to or replaced in a SQL table.
Now, I want to loop over this pipeline for every 5 day from Jan 2022 till October 2024.
Manually, this would require updating the parameters.yml
file each time I want to change the date and rerun the pipeline (kedro run
).
I don't want to introduce a loop directly into the pipeline, as it’s cleaner when observed_date
is treated as a single date rather than a list of dates.
However, I’d like to find a clean way to loop over different dates, running kedro run
for each date.