Hi Team!
I am trying to read a bigquery table using the spark.SparkDataSet
with an arbitrary query as follows
trx_agg_data: type: spark.SparkDataSet file_format: bigquery load_args: viewsEnabled: true query: | SELECT ph.category, MAX(trx.sales) FROM {project}.{dataset}.trx_data trx LEFT JOIN {project}.{dataset}.prod_hierarchy ph filepath: <a target="_blank" rel="noopener noreferrer" href="gs://my-bucket/trx_agg_data.parquet">gs://my-bucket/trx_agg_data.parquet</a>
filepath
is not in the correct format (BigQuery expected <project>.<dataset>.<table>
), but I am trying to read it with a query.spark.read.format("bigquery").option("query", "SELECT ph.category, MAX(trx.sales) FROM {project}.{dataset}.trx_data trx LEFT JOIN {project}.{dataset}.prod_hierarchy ph" ).load()
spark.SparkDataSet
does not have this functionality. Should I create a custom dataset here?I think you'd have more luck with the JDBC approach
https://docs.kedro.org/en/0.18.7/kedro.datasets.spark.SparkJDBCDataSet.html
It requires to pass a table
init parameter but my SQL query can contain arbitrary number of tables
So similar to kedro.datasets.pandas.GBQQueryDataSet
is exactly what I want
vehicles: type: pandas.GBQQueryDataSet sql: "select shuttle, shuttle_id from spaceflights.shuttles;" project: my-project credentials: gbq-creds load_args: reauth: TrueI think then creating
spark.GBQQueryDataSet
is my best bet?Or you extend / override the existing spark.SparkDataSet to support this, if it works we'd love a PR back into Kedro
Sure, yes seems like a good idea! Although I must note that I am on kedro==0.18.14
for this π Would be similar to implement for kedro_datasets
package post kedro>=0.19
though
get it working locally first and then I can help you get your contribution into kedro-datasets
Hi @U03R8FW4HUZ + Kedro Team :kedro:
Opened a PR on kedro-plugins
to implement a new dataset spark.GBQQueryDataset
feat(datasets): Implement `spark.GBQQueryDataset` for reading data from BigQuery as a spark dataframe using SQL query #971
Currently draft, but would be great if I can have some initial comments π