hello, what is the proper way to add a current timestamp to the names of catalog entries thanks
Hi @Gauthier Pierard, if your goal is to version your dataset, you can set versioned: True
in the catalog entry. This will save your datasets with a timestamp-based version for each kedro run.
https://docs.kedro.org/en/stable/data/data_catalog.html#dataset-versioning
thanks Rashida but I actually need more control.
in general my save folders are like output_folder_<parameter>_<from_date>_<to_date>
.
where from_date
and to_date
are defined by a node and saved as memorydatasets in the catalog.
is it possible to define other catalog entries whose name depends on previous entries?
If I understand this correctly you'd essentially like to dynamically create your catalog based on previous runs?
indeed. I suppose this is best done in python with something like
CSVDataset( filepath="<a target="_blank" rel="noopener noreferrer" href="s3://test_bucket/data/02_intermediate/company/motorbikes.csv">s3://test_bucket/data/02_intermediate/company/motorbikes.csv</a>", load_args=dict(sep=",", skiprows=5, skipfooter=1, na_values=["#NA", "NA"]), credentials=dict(key="token", secret="key"), )and
# save the dataset to data/01_raw/test.csv/<version>/test.csv catalog.save("test_dataset", data1)correct?
The above allows you to save the data, but you wouldn't preserve the dataset entry in the catalog. Saving here doesn't add it to the catalog itself.
Do you need to have the catalog for future processing or are you okay with just saving the data to storage?
Yes i understand the catalog file won't be updated, only the catalog object in memory.
However could I define a partitionedDataset at the parent directory that would load the dynamically generated output paths and files for future computations?
You could possibly use OmegaConfigLoaders and define it in settings.py and then define your catalog filepath as filepath: data/02_intermediate/pypi_kedro_demo_${now:}.csv
Here is an example code - https://github.com/kedro-org/kedro/issues/2355#issuecomment-2260512795
hmm this seems to involve the fle datasets.py
with which I am not familiar, thanks for the idea in any case
However could I define a partitionedDataset at the parent directory that would load the dynamically generated output paths and files for future computations?