Join the Kedro community

Updated 5 months ago

Validating data with great expectations for merged datasets

At a glance

Hi all! The Kedro documentation has a nice example of how to validate data with great expectations. But it only looks at one dataset at a time. But what would I do if I need to validate the data of a node that merges two datasets? Let's say one table is a lookup table and the other table may only contain entries that exists in the lookup table? Has anyone every checked multiple datasets at a time? Do you have an example for that?

5 comments

NNok Lam Chan

Can you validate the output after merging?

ddatajoely

Yeah you're touching on the difference between testing on persisted versus in memory data

ddatajoely

I saw this the other day

Attachment

ddatajoely

GE falls towards the end of the spectrum

ddatajoely

Something like testing the cardinality of a join could be something you validate on persisted outputs, or it could be something that you "shift left" and test at execution time with something like Pandera

Add a reply