Join the Kedro community

Updated 2 months ago

Defining Filters in Load Args for Dask ParquetDataset

Hi everyone,

I need some help understanding how to define filters in load_args when loading a ParquetDataset with Dask from the catalog.

My catalog entry would be something like:

data:
  type: dask.ParquetDataset
  filepath: data/
  load_args :
    filters: [('filter_1', '==', 1) or
                ('filter_2', '==', 1) or
                ('filter_3', '==', 1) or
                ('filter_4', '==', 1) ]
I tested this exact syntax for filters in the Python API and while it works there, I cannot seem to find a way to make it work using the catalog, since it raises the error:
kedro.io.core.DatasetError: Failed while loading data from data set 
An error occurred while calling the read_parquet method registered to the pandas backend.
Original Message: too many values to unpack (expected 3)

N
2 comments

Is this actual code or a string literal filter?

If you need some exotic way to run python code, i.e. Python datatype for Polars , you may want to check out resolver in docs.kedro.org, where you can provide custom expression

Add a reply
Sign up and join the conversation on Slack