Join the Kedro community

Updated 2 months ago

Parameterizing Schema in Data Catalog

At a glance

The community member is having an issue with parameterizing the schema in their data catalog. They tried to use ${params.schema} in the load_args of their SQL table dataset, but encountered an InterpolationKeyError because the params.schema key was not found.

The comments suggest a few potential solutions:

  • Defining a custom OmegaConf resolver in the settings.py file
  • Creating separate dev and prd environments with different schema configurations in the data catalog
  • Using Kedro's additional configuration environments to manage the different schema configurations

There is no explicitly marked answer in the comments, but the community members are discussing potential solutions to the issue.

Hi,

Need help with the below data catalog

sql_table:
  type: pandas.SQLTableDataset
  table_name: RD
  load_args:
    schema: RD.dev

sql_table:
  type: pandas.SQLTableDataset
  table_name: RD
  load_args:
    schema: RD.prd
Basically I want to be able to parameterize the schema in parameters.yml

schema: "dev" # prd

I tried updating my data catalog as below

sql_table:
  type: pandas.SQLTableDataset
  table_name: RD
  load_args:
    schema: RD.${params.schema}
And get this error, but haven't been able to debug it unfortunately. Appreciate any advice on this. Thanks!
InterpolationKeyError: Interpolation key 'params.schema' not found
    full_key: sql_table.load_args.schema
    object_type=dict

C
A
5 comments

Does this require defining a custom OmegaConf resolver in settings.py?

here's my settings.py. Essentially, I've uncommented the parameters so that the OmegaConf will work on the parameters

CONFIG_LOADER_ARGS = {
      "base_env": "base",
      "default_run_env": "local",
      "config_patterns": {
          "spark" : ["spark*/"],
          "parameters": ["parameters*", "parameters*/**", "**/parameters*"],
      }
}

Maybe you want a dev environment and prd environment. The majority of the dataset definitions would be the same but the dev and prd catalogs would differ for the schema key for each dataset's load_args.

I understand. Our current setup makes it quite flexible to switch dev and prd within the same codebase mainly because there are situations where we need to develop using prd data as input.
Not the best practice, yes.

I think you can do what you're looking for using Kedro's additional configuration environments.

By default there's a conf/base/catalog.yml and conf/local/catalog.yml. If you create conf/dev/catalog.yml and conf/prd/catalog.yml where the schemas/lifepaths are different you can do kedro run --env prd to run using the production data.

Add a reply
Sign up and join the conversation on Slack