Hi,
Need help with the below data catalog
sql_table: type: pandas.SQLTableDataset table_name: RD load_args: schema: RD.dev sql_table: type: pandas.SQLTableDataset table_name: RD load_args: schema: RD.prdBasically I want to be able to parameterize the schema in parameters.yml
schema: "dev" # prd
sql_table: type: pandas.SQLTableDataset table_name: RD load_args: schema: RD.${params.schema}And get this error, but haven't been able to debug it unfortunately. Appreciate any advice on this. Thanks!
InterpolationKeyError: Interpolation key 'params.schema' not found full_key: sql_table.load_args.schema object_type=dict
here's my settings.py
. Essentially, I've uncommented the parameters so that the OmegaConf will work on the parameters
CONFIG_LOADER_ARGS = { "base_env": "base", "default_run_env": "local", "config_patterns": { "spark" : ["spark*/"], "parameters": ["parameters*", "parameters*/**", "**/parameters*"], } }
Maybe you want a dev
environment and prd
environment. The majority of the dataset definitions would be the same but the dev
and prd
catalogs would differ for the schema
key for each dataset's load_args
.
I understand. Our current setup makes it quite flexible to switch dev
and prd
within the same codebase mainly because there are situations where we need to develop using prd
data as input.
Not the best practice, yes.
I think you can do what you're looking for using Kedro's additional configuration environments.
By default there's a conf/base/catalog.yml
and conf/local/catalog.yml
. If you create conf/dev/catalog.yml
and conf/prd/catalog.yml
where the schemas/lifepaths are different you can do kedro run --env prd
to run using the production data.