Maybe this is pushing the current dataset factories too far but is it possible to parametrise a SQL Catalog entry where the SQL is read from a file?
Like:
mytable: type: pandas.SQLQueryDataset credentials: postgres_dwh filepath: sql/mytable.sql
Hey Luis,
I think you could technically do it -
mytable_{table}: type: pandas.SQLQueryDataset credentials: <cred> sql: SELECT * from {table}And in your
pipeline_registry.py
/ pipeline.py
, have a script that reads the queries from a file and generates the pipeline dynamically.Yeah, the factory placeholder wouldn’t work within a file because the catalog wouldn’t load the queries, the file would be read inside the dataset initialisation part
This is where ibis
would be a better fit for SQL parameterisation / multi-nodes SQL lazy evaluated
https://docs.kedro.org/projects/kedro-datasets/en/latest/api/kedro_datasets.ibis.TableDataset.html
I am not 100% sure if it supports the SQL interface, as ibis native interface is dataframe. So that may requires some change.