Join the Kedro community

Updated 3 weeks ago

Parameterizing Schema in Data Catalog

Hi,

Need help with the below data catalog

sql_table:
  type: pandas.SQLTableDataset
  table_name: RD
  load_args:
    schema: RD.dev

sql_table:
  type: pandas.SQLTableDataset
  table_name: RD
  load_args:
    schema: RD.prd
Basically I want to be able to parameterize the schema in parameters.yml

schema: "dev" # prd

I tried updating my data catalog as below

sql_table:
  type: pandas.SQLTableDataset
  table_name: RD
  load_args:
    schema: RD.${params.schema}
And get this error, but haven't been able to debug it unfortunately. Appreciate any advice on this. Thanks!
InterpolationKeyError: Interpolation key 'params.schema' not found
    full_key: sql_table.load_args.schema
    object_type=dict

C
A
5 comments

Does this require defining a custom OmegaConf resolver in settings.py?

here's my settings.py. Essentially, I've uncommented the parameters so that the OmegaConf will work on the parameters

CONFIG_LOADER_ARGS = {
      "base_env": "base",
      "default_run_env": "local",
      "config_patterns": {
          "spark" : ["spark*/"],
          "parameters": ["parameters*", "parameters*/**", "**/parameters*"],
      }
}

Maybe you want a dev environment and prd environment. The majority of the dataset definitions would be the same but the dev and prd catalogs would differ for the schema key for each dataset's load_args.

I understand. Our current setup makes it quite flexible to switch dev and prd within the same codebase mainly because there are situations where we need to develop using prd data as input.
Not the best practice, yes.

I think you can do what you're looking for using Kedro's additional configuration environments.

By default there's a conf/base/catalog.yml and conf/local/catalog.yml. If you create conf/dev/catalog.yml and conf/prd/catalog.yml where the schemas/lifepaths are different you can do kedro run --env prd to run using the production data.

Add a reply
Sign up and join the conversation on Slack