I am writing my first Kedro pipeline tests and I am a little confused.
I am testing a pipeline with two nodes, however the first node outputs a spark object which needs to have copy mode assign as a memory dataset. How can I specify that in python rather than yaml?
catalog = DataCatalog( )
caplog.set_level(logging.DEBUG, logger="kedro")
successful_run_msg = "Pipeline execution completed successfully."
SequentialRunner().run(pipeline, catalog)
assert successful_run_msg in caplog.text
do I do that using add_feed_dict? how?
So you can use Kedro this way, but it's not actually the way we recommend unless you have a specific reason to do so.
I would really recommend that you follow the Spaceflights tutorial since it covers the key concepts and abstracts some of this complexity
we also have a full training course on YouTube :youtube:
https://www.youtube.com/playlist?list=PL-JJgymPjK5LddZXbIzp9LWurkLGgB-nY
This is the recommended way in the Kedro documentation to write pipeline tests: https://docs.kedro.org/en/stable/tutorial/test_a_project.html
I've been desperate to get a kedro-test
micro-framework off the ground but it's been hard to prioritise
If we end up using Kedro we might be interested in doing some OSS contributions with it so could maybe help
Maybe this one helps? https://github.com/kedro-org/kedro/blob/main/tests/pipeline/test_pipeline_integration.py
hmmmm mine is similar but I'm having the issue that I don't know how to specify that the output of 1 pipeline should use copy_mode "assign"
I guess something like:
dataset = MemoryDataset({"data": 42}, copy_mode="assign") DataCatalog().add_feed_dict({"dataset":dataset})
Yes indeed, it's:
catalog = DataCatalog(
datasets={
"data_utility": MemoryDataset(copy_mode="assign"),
"extract_model_features": MemoryDataset(copy_mode="assign"),
},
)