Join the Kedro community

Updated last month

Catalog and Pipeline Definitions

At a glance

Can anyone suggest the best way to access:

  1. Catalog definition
  2. Pipeline definition

Before the pipeline runs, and ideally outside the normal kedro run life cycle?

Im trying to accomplish two very different things with this

  1. is trying to implicitly figure out which nodes depend on each other via memory datasets, to support using memory datasets in a distributed argo pipeline running a kedro pipeline
  2. generate documentation via a mermaid diagram that I can store in a readme file. Similar to kedro viz (but with some subtle key features)

J
N
4 comments

for 2., we've been thinking about that for a long time but there's nothing very solid yet... an early prototype was https://github.com/AlpAribal/kedro-inspect/ you might want to have a look

Hmm for pipeline I have something like this that generate a pipeline ascii

https://github.com/noklam/kedro-example/blob/master/ascii_hook%2Fsrc%2Fascii_hook%2Fdagascii.py

Not sure if they still run I created this few years ago, but should not take too much to edit

If you want to figure out which datasets is memory dataset, you can use kedro catalog create that fills all the missing dataset with memory dataset in catalog.

If you want to do something differently, easiest way is probably take that logic and modified it as a new CLI or a new hook

Add a reply
Sign up and join the conversation on Slack