Join the Kedro community

Updated last month

Validating data with great expectations for merged datasets

Hi all! The Kedro documentation has a nice example of how to validate data with great expectations. But it only looks at one dataset at a time. But what would I do if I need to validate the data of a node that merges two datasets? Let's say one table is a lookup table and the other table may only contain entries that exists in the lookup table? Has anyone every checked multiple datasets at a time? Do you have an example for that?

N
d
5 comments

Can you validate the output after merging?

Yeah you're touching on the difference between testing on persisted versus in memory data

I saw this the other day

Attachment
image-1.png

GE falls towards the end of the spectrum

Something like testing the cardinality of a join could be something you validate on persisted outputs, or it could be something that you "shift left" and test at execution time with something like Pandera

Add a reply
Sign up and join the conversation on Slack