Parameterizing Schema in Data Catalog

At a glance

The community member is having an issue with parameterizing the schema in their data catalog. They tried to use ${params.schema} in the load_args of their SQL table dataset, but encountered an InterpolationKeyError because the params.schema key was not found.

The comments suggest a few potential solutions:

Defining a custom OmegaConf resolver in the settings.py file
Creating separate dev and prd environments with different schema configurations in the data catalog
Using Kedro's additional configuration environments to manage the different schema configurations

There is no explicitly marked answer in the comments, but the community members are discussing potential solutions to the issue.

AAfiq Johari

Hi,

Need help with the below data catalog

sql_table:
  type: pandas.SQLTableDataset
  table_name: RD
  load_args:
    schema: RD.dev

sql_table:
  type: pandas.SQLTableDataset
  table_name: RD
  load_args:
    schema: RD.prd

Basically I want to be able to parameterize the schema in parameters.yml

schema: "dev" # prd

I tried updating my data catalog as below

sql_table:
  type: pandas.SQLTableDataset
  table_name: RD
  load_args:
    schema: RD.${params.schema}

And get this error, but haven't been able to debug it unfortunately. Appreciate any advice on this. Thanks!

InterpolationKeyError: Interpolation key 'params.schema' not found
    full_key: sql_table.load_args.schema
    object_type=dict

5 comments

CChris Schopp

Does this require defining a custom OmegaConf resolver in settings.py?

AAfiq Johari

here's my settings.py. Essentially, I've uncommented the parameters so that the OmegaConf will work on the parameters

CONFIG_LOADER_ARGS = {
      "base_env": "base",
      "default_run_env": "local",
      "config_patterns": {
          "spark" : ["spark*/"],
          "parameters": ["parameters*", "parameters*/**", "**/parameters*"],
      }
}

CChris Schopp

Maybe you want a dev environment and prd environment. The majority of the dataset definitions would be the same but the dev and prd catalogs would differ for the schema key for each dataset's load_args.

AAfiq Johari

I understand. Our current setup makes it quite flexible to switch dev and prd within the same codebase mainly because there are situations where we need to develop using prd data as input.
Not the best practice, yes.

CChris Schopp

I think you can do what you're looking for using Kedro's additional configuration environments.

By default there's a conf/base/catalog.yml and conf/local/catalog.yml. If you create conf/dev/catalog.yml and conf/prd/catalog.yml where the schemas/lifepaths are different you can do kedro run --env prd to run using the production data.

Add a reply

Join the Kedro community

Parameterizing Schema in Data Catalog