Join the Kedro community

Updated 2 weeks ago

Parametrizing SQL Catalog Entries With File-based Queries

Maybe this is pushing the current dataset factories too far but is it possible to parametrise a SQL Catalog entry where the SQL is read from a file?

Like:

mytable:
  type: pandas.SQLQueryDataset
  credentials: postgres_dwh
  filepath: sql/mytable.sql

basically, I'd like to pass parameters to the SQL query

A
L
N
7 comments

Hey Luis,
I think you could technically do it -

mytable_{table}:
  type: pandas.SQLQueryDataset
  credentials: <cred>
  sql: SELECT * from {table}
And in your pipeline_registry.py / pipeline.py, have a script that reads the queries from a file and generates the pipeline dynamically.
The way dataset factories works is that it reads the dataset name from the pipeline inputs/outputs and then fills in the placeholders in the catalog entry, so the dataset names might get crazy looking

ok, but not if the SQL is in a file right?

Yeah, the factory placeholder wouldn’t work within a file because the catalog wouldn’t load the queries, the file would be read inside the dataset initialisation part

This is where ibis would be a better fit for SQL parameterisation / multi-nodes SQL lazy evaluated

Are there docs on this? @Nok

https://docs.kedro.org/projects/kedro-datasets/en/latest/api/kedro_datasets.ibis.TableDataset.html

I am not 100% sure if it supports the SQL interface, as ibis native interface is dataframe. So that may requires some change.

Add a reply
Sign up and join the conversation on Slack