Join the Kedro community

Updated 7 days ago

Simple sql query dataset

Hi,

let's say I have this simple SQLQueryDataset

table_output:
  type: pandas.SQLQueryDataset
  sql: >
       select * from dev.output

I also have parameters.yml
mode: 'ABC' 

The expectation now is to update table_output to filter itself based on parameter mode
If I use the following hard coded load_args method, the table_output will be filtered based on the 'ABC' params
table_output:
  type: pandas.SQLQueryDataset
  sql: >
       select * from dev.output
       where mode = ?
  load_args:
     params:
       - 'ABC'

How can I reuse the parameters defined in parameters.yml as part of the table_output definition?
Something like
table_output:
  type: pandas.SQLQueryDataset
  sql: >
       select * from dev.output
       where mode = ?
  load_args:
     params:
       - ${parameters.mode} # to reference parameters in parameters.yml?

1
A
A
N
8 comments

@Ankita Katiyar thanks! globals.yml seems to be what I need. I'm just curious as to why it's not used as the default place to hold parameters definition, given that it's accessible across all config files, which I think very useful in many cases.

The only thing now is that I have to redefine my parameters in parameters.yml so that it resolves from globals.yml, which seems to be like a double definition.

It's mostly for separation. parameters.yml are meant to be node argument, something that get passed into your node function. Where the parameters.mode here are acting more like a template variable.

You could also move this mode to catalog.yml

table_output:
  type: pandas.SQLQueryDataset
  sql: >
       select * from dev.output
       where mode = ${_mode}
  load_args:
     params:
       - ${_mode} # to reference parameters in parameters.yml?

_mode: "some_mode"



For things that are absolutely global (needed for catalog/parameters and other stuff), then you should use globals (the anti-pattern here is use globals.yml for everything)

is it possible to do something similar with a namespaced dataset?

{mycountry}.table_output:
type: pandas.SQLQueryDataset
sql: >
select * from mytable.output
where country = ${mycountry}
load_args:
params:
- ${{mycountry}.name}

This can be done with factories, though I think it is built for a slightly different purpose.

Yes you are right. You can use the namedspace values e.g. {mycountry} directly in the params or filepath etc. but you can't use it as a key value.
For example if in params you had

France:
short_name: FR

you can't get the "FR" in the catalog.

Add a reply
Sign up and join the conversation on Slack