Catalog and Pipeline Definitions

Ben Shaughnessy · 2025-01-21T17:16:51.000Z

Can anyone suggest the best way to access: Catalog definition Pipeline definition Before the pipeline runs, and ideally outside the normal kedro run life cycle? Im trying to accomplish two very different things with this is trying to implicitly figure out which nodes depend on each other via memory datasets, to support using memory datasets in a distributed argo pipeline running a kedro pipeline generate documentation via a mermaid diagram that I can store in a readme file. Similar to kedro viz (but with some subtle key features)

At a glance

BBen Shaughnessy

Can anyone suggest the best way to access:

Catalog definition
Pipeline definition

Before the pipeline runs, and ideally outside the normal kedro run life cycle?

Im trying to accomplish two very different things with this

is trying to implicitly figure out which nodes depend on each other via memory datasets, to support using memory datasets in a distributed argo pipeline running a kedro pipeline
generate documentation via a mermaid diagram that I can store in a readme file. Similar to kedro viz (but with some subtle key features)

4 comments

JJuan Luis Cano Rodríguez

for 1. you can always instantiate the config loader and data catalog programmatically, see for example https://docs.kedro.org/en/stable/notebooks_and_ipython/notebook-example/add_kedro_to_a_notebook.html#use-kedro-s-configurat[…]-load-the-data-catalog

JJuan Luis Cano Rodríguez

for 2., we've been thinking about that for a long time but there's nothing very solid yet... an early prototype was https://github.com/AlpAribal/kedro-inspect/ you might want to have a look

NNok Lam Chan

Hmm for pipeline I have something like this that generate a pipeline ascii

https://github.com/noklam/kedro-example/blob/master/ascii_hook%2Fsrc%2Fascii_hook%2Fdagascii.py

Not sure if they still run I created this few years ago, but should not take too much to edit

NNok Lam Chan

If you want to figure out which datasets is memory dataset, you can use kedro catalog create that fills all the missing dataset with memory dataset in catalog.

If you want to do something differently, easiest way is probably take that logic and modified it as a new CLI or a new hook

Add a reply

Join the Kedro community

Catalog and Pipeline Definitions