Join the Kedro community

M
M
M
D
M

Testing pipeline with spark object output and memory dataset configuration

I am writing my first Kedro pipeline tests and I am a little confused.

I am testing a pipeline with two nodes, however the first node outputs a spark object which needs to have copy mode assign as a memory dataset. How can I specify that in python rather than yaml?

catalog = DataCatalog( )
caplog.set_level(logging.DEBUG, logger="kedro")
successful_run_msg = "Pipeline execution completed successfully."
SequentialRunner().run(pipeline, catalog)
assert successful_run_msg in caplog.text

do I do that using add_feed_dict? how?

d
A
M
19 comments

So you can use Kedro this way, but it's not actually the way we recommend unless you have a specific reason to do so.

I would really recommend that you follow the Spaceflights tutorial since it covers the key concepts and abstracts some of this complexity

But this is for integration tests for pipelines

that falls into a good reason

This is the recommended way in the Kedro documentation to write pipeline tests: https://docs.kedro.org/en/stable/tutorial/test_a_project.html

No worries take your time I'd appreciate any help I can get

You might be able to get some inspiration from the Kedro code base tests!

Let me see if I can find a good example

Oh that's a great idea actually, I'm having Friday brain!

I've been desperate to get a kedro-test micro-framework off the ground but it's been hard to prioritise

If we end up using Kedro we might be interested in doing some OSS contributions with it so could maybe help

hmmmm mine is similar but I'm having the issue that I don't know how to specify that the output of 1 pipeline should use copy_mode "assign"

I guess something like:

dataset = MemoryDataset({"data": 42}, copy_mode="assign")
DataCatalog().add_feed_dict({"dataset":dataset})

Yes indeed, it's:

catalog = DataCatalog(
datasets={
"data_utility": MemoryDataset(copy_mode="assign"),
"extract_model_features": MemoryDataset(copy_mode="assign"),
},
)

I got it working now thanks!

Add a reply
Sign up and join the conversation on Slack
Join