Hello everyone
Is there a way to load a flat file from S3 based on some conditions like pulling the latest file from the mentioned bucket.
Split it into two problems:
But how is that done, the below example expects the exact name of the filemotorbikes:
type: pandas.CSVDataset
filepath:
s3://your_bucket/data/02_intermediate/company/motorbikes.csv
credentials: dev_s3
load_args:
sep: ','
skiprows: 5
skipfooter: 1
na_values: ['#NA', NA]
I just know the name of the bucket, we need to fetch the files based on some conditions right ?
but same manner you can add a dataset, have filtering args in constructor and use those args in the load method
I just found that PartionedDatasets , provides a way of iterating over each file present in a bucket/folder
https://docs.kedro.org/en/stable/data/partitioned_and_incremental_datasets.html#partitioned-dataset-load
ah yes, that one exists , though then you'll be implementing the conditions in the node
you might even be able to extend the ParitionedDataset and overload the load method there to call super and thereafter do the filtering