Join the Kedro community

Updated last month

Parametrizing SQL Catalog Entries With File-based Queries

At a glance

Maybe this is pushing the current dataset factories too far but is it possible to parametrise a SQL Catalog entry where the SQL is read from a file?

Like:

mytable:
  type: pandas.SQLQueryDataset
  credentials: postgres_dwh
  filepath: sql/mytable.sql

basically, I'd like to pass parameters to the SQL query

7 comments

AAnkita Katiyar

Hey Luis,
I think you could technically do it -

mytable_{table}:
  type: pandas.SQLQueryDataset
  credentials: <cred>
  sql: SELECT * from {table}

And in your pipeline_registry.py / pipeline.py, have a script that reads the queries from a file and generates the pipeline dynamically.
The way dataset factories works is that it reads the dataset name from the pipeline inputs/outputs and then fills in the placeholders in the catalog entry, so the dataset names might get crazy looking

LLuis Chaves Rodriguez

ok, but not if the SQL is in a file right?

AAnkita Katiyar

Yeah, the factory placeholder wouldn’t work within a file because the catalog wouldn’t load the queries, the file would be read inside the dataset initialisation part

NNok Lam Chan

This is where ibis would be a better fit for SQL parameterisation / multi-nodes SQL lazy evaluated

LLuis Chaves Rodriguez

Are there docs on this? @Nok

NNok Lam Chan

https://docs.kedro.org/projects/kedro-datasets/en/latest/api/kedro_datasets.ibis.TableDataset.html

I am not 100% sure if it supports the SQL interface, as ibis native interface is dataframe. So that may requires some change.

NNok Lam Chan

https://kedro.org/blog/building-scalable-data-pipelines-with-kedro-and-ibis

Add a reply