Join the Kedro community

A
M
M
M
D

How Kedro Pipeline Reads Input Datasets

Hi, all. I have a question regarding how nodes/pipelines read dataset as input datasets. Take this catalog configuration in the following link as example, I assume the kedro pipeline read data from CSV file stored in Amazon S3 when you specify as inputs=["cars"] in node configuration. I was wondering if there are multiple different nodes that take "cars" as input datasets, does kedro pipeline use those datasets from memory, or does it read from Amazon S3 every time they need the datasets?

https://docs.kedro.org/en/stable/data/data_catalog_yaml_examples.html#load-multiple-datasets-with-similar-configuration-using-yaml-anchors

And if it does read the same datasets from certain data source every time it runs the various nodes, is it possible to store the dataset in memory after the first reading from whatever the data source is (Amazon S3 CSV file in this case) and reuse them from memory so that you don't need to read from the data source multiple times and possibly leading to shorter processing time?

Add a reply
Sign up and join the conversation on Slack
Join