Afiq Johari

Unexpected Error When Using Kedro Vs Code Extension

Hi,

When using kedro vs code extension,

I'm getting this unexpected error, despite the credentials being defined. and the dataset type is valid because the latest documentation asks for pandas.SQLQueryDataset and not pandas.SQLQueryDataSet

"Dataset 'output' has an invalid type 'pandas.SQLQueryDataset'. "Unable to find credentials 'my_cred': check your data catalog and credentials configuration. See https://kedro.readthedocs.io/en/stable/kedro.io.DataCatalog.html for an example."Kedro LSP"

I've also defined "kedro.environment": "local" in settings.json

Note that, the error only appears when I enable the extension, and no error when I disable the extension, and the pipelines work just fine.

10 comments

AAfiq Johari

Simple sql query dataset

Hi,

let's say I have this simple SQLQueryDataset

table_output:
  type: pandas.SQLQueryDataset
  sql: >
       select * from dev.output

I also have parameters.yml

mode: 'ABC'

The expectation now is to update table_output to filter itself based on parameter mode
If I use the following hard coded load_args method, the table_output will be filtered based on the 'ABC' params

table_output:
  type: pandas.SQLQueryDataset
  sql: >
       select * from dev.output
       where mode = ?
  load_args:
     params:
       - 'ABC'

How can I reuse the parameters defined in parameters.yml as part of the table_output definition?
Something like

table_output:
  type: pandas.SQLQueryDataset
  sql: >
       select * from dev.output
       where mode = ?
  load_args:
     params:
       - ${parameters.mode} # to reference parameters in parameters.yml?

8 comments

AAfiq Johari

Parameterizing Schema in Data Catalog

Hi,

Need help with the below data catalog

sql_table:
  type: pandas.SQLTableDataset
  table_name: RD
  load_args:
    schema: RD.dev

sql_table:
  type: pandas.SQLTableDataset
  table_name: RD
  load_args:
    schema: RD.prd

Basically I want to be able to parameterize the schema in parameters.yml

schema: "dev" # prd

I tried updating my data catalog as below

sql_table:
  type: pandas.SQLTableDataset
  table_name: RD
  load_args:
    schema: RD.${params.schema}

And get this error, but haven't been able to debug it unfortunately. Appreciate any advice on this. Thanks!

InterpolationKeyError: Interpolation key 'params.schema' not found
    full_key: sql_table.load_args.schema
    object_type=dict

5 comments

AAfiq Johari

Table a: Appending and Updating Rows with Primary Keys

tableA:
  type: pandas.SQLTableDataset
  table_name: tableA
  load_args:
    schema: my.dev
  save_args:
    schema: my.dev
    if_exists: "append"
    index: False

<strike> I have a table, TableA, which is the final output of my pipeline. The table already contains primary keys. If the primary key values are not yet in the table, I want to append the new rows. However, for rows where the primary key values already exist, I want to update those rows with the new results. What would be the right save_args to use in this case? I tried 'replace' for if_exists, but this keeps deleting the whole table and only the current results are stored. If I use 'append', despite the primary keys, duplicated results will still be inserted into the table </strike><strike>https://docs.kedro.org/en/0.18.14/kedro_datasets.pandas.SQLTableDataset.html#kedro_datasets.pandas.SQLTableDataset</strike>

1 comment

AAfiq Johari

Running The Pipeline For Multiple Dates

What is the cleanest way to run the entire pipeline multiple times?

I have a parameter, observed_date = '2024-10-01', defined in parameters.yml, that I use to run the pipeline. At the end of the pipeline, the output is saved to or replaced in a SQL table.

Now, I want to loop over this pipeline for every 5 day from Jan 2022 till October 2024.

Manually, this would require updating the parameters.yml file each time I want to change the date and rerun the pipeline (kedro run).

I don't want to introduce a loop directly into the pipeline, as it’s cleaner when observed_date is treated as a single date rather than a list of dates.

However, I’d like to find a clean way to loop over different dates, running kedro run for each date.

2 comments

Join the Kedro community

Unexpected Error When Using Kedro Vs Code Extension

Simple sql query dataset

Parameterizing Schema in Data Catalog

Table a: Appending and Updating Rows with Primary Keys

Running The Pipeline For Multiple Dates