Join the Kedro community

Updated 2 days ago

Adding Timestamps to Catalog Entries

hello, what is the proper way to add a current timestamp to the names of catalog entries thanks

R
G
M
11 comments

Hi @Gauthier Pierard, if your goal is to version your dataset, you can set versioned: True in the catalog entry. This will save your datasets with a timestamp-based version for each kedro run.
https://docs.kedro.org/en/stable/data/data_catalog.html#dataset-versioning

thanks Rashida but I actually need more control.
in general my save folders are like output_folder_<parameter>_<from_date>_<to_date>.
where from_date and to_date are defined by a node and saved as memorydatasets in the catalog.
is it possible to define other catalog entries whose name depends on previous entries?

If I understand this correctly you'd essentially like to dynamically create your catalog based on previous runs?

indeed. I suppose this is best done in python with something like

CSVDataset(
    filepath="<a target="_blank" rel="noopener noreferrer" href="s3://test_bucket/data/02_intermediate/company/motorbikes.csv">s3://test_bucket/data/02_intermediate/company/motorbikes.csv</a>",
    load_args=dict(sep=",", skiprows=5, skipfooter=1, na_values=["#NA", "NA"]),
    credentials=dict(key="token", secret="key"),
)
and
# save the dataset to data/01_raw/test.csv/<version>/test.csv
catalog.save("test_dataset", data1)
correct?

The above allows you to save the data, but you wouldn't preserve the dataset entry in the catalog. Saving here doesn't add it to the catalog itself.

Do you need to have the catalog for future processing or are you okay with just saving the data to storage?

Yes i understand the catalog file won't be updated, only the catalog object in memory.
However could I define a partitionedDataset at the parent directory that would load the dynamically generated output paths and files for future computations?

You could possibly use OmegaConfigLoaders and define it in settings.py and then define your catalog filepath as filepath: data/02_intermediate/pypi_kedro_demo_${now:}.csv Here is an example code - https://github.com/kedro-org/kedro/issues/2355#issuecomment-2260512795

hmm this seems to involve the fle datasets.py with which I am not familiar, thanks for the idea in any case

You can ignore that file! It was just an example

However could I define a partitionedDataset at the parent directory that would load the dynamically generated output paths and files for future computations?

As far as I know this should be possible, because the load path you just provide the top level directory https://docs.kedro.org/en/stable/data/partitioned_and_incremental_datasets.html#partitioned-dataset-load

Add a reply
Sign up and join the conversation on Slack