Join the Kedro community

M
M
M
D
M

Getting total execution time for a databricks workflow

Hi everyone. By using hooks I’ve succeeded to show execution time of each nodes. However, I also want to know how long the whole process takes, which is from loading data, executing nodes, and eventually to saving data to Databricks catalog.

So in the attached image, I want to know the time difference between “INFO Completed 1 out of tasks” and “INFO Loading data from ‘params: …”, not just node execution time. I surely can know the time difference simply by manually calculating, but because there are hundreds of nodes, it takes at least an hour to calculate all of them, and it would be really helpful to be able to know how long each tasks take by first glance. Is there any way to do this? Is it also possible by utilizing hooks?

https://kedro-org.slack.com/archives/C03RKP2LW64/p1728353683266369

R
S
2 comments

would using before_pipeline_run and after_pipeline_run hooks help in your case ?

Thanks for your reply! There are dozens of nodes in each pipeline, so using before_pipeline_run and after_pipeline_run hooks only show the execution time of the whole pipeline process, not that of each nodes. But I noticed that by getting the time difference between before_node_run of each nodes, I can get the approximate duration time of “data loading + node running + data saving” process. So problem solved! Thank you again!

Add a reply
Sign up and join the conversation on Slack
Join