Join the Kedro community

Home
Members
Matthias Roels
M
Matthias Roels
Offline, last seen 2 days ago
Joined October 21, 2024

I’m working on a big project that is about to hit it’s next phase. We are using kedro and we have a large single kedro project. To give you an idea on how big, we have about 500+ catalog entries, 500+ nodes in different kedro pipelines (we disabled the default sum of all pipelines as it is too large to use). Now I know the general guideline is to split your project in several smaller ones if it becomes too big, but I need some advice/opinions on this. I’ll explain more details in the comments. Thanks!

8 comments
M
A
R
M
J

When switching to OmegaConfLoader, it seems isoformatted dates (e.g. 2025-01-08) are no longer automatically converted to date objects. I thought an easy fix would be to create a custom resolver to do the conversion for me. But then I got an error stating that a date object was not a valid primitive type when creating OmegaConf objects. The issue seems to be when soft-merging params, you convert resolved dict objects back into OmegaConf objects which caused the error. Is this a bug?

9 comments
M
N
A

Regarding https://github.com/kedro-org/kedro/issues/4322, I am working on upgrading a big project from kedro 0.18.13 to the latest version. While doing so, I am also removing a custom ConfigLoader as I want to use OmegaConf. However, I do see some performance issues here too compared to the custom implementation we had. Did some debugging (using logging in my hooks) and found the following:

  • project has 1500 catalog entries with most of the filepath combining info from globals (bucket, prefix, data version,…)
  • With kedro 0.18, I was able to load the project in a notebook in around 25sec
  • In the new version, it takes 100sec
  • Most of the load times happens after my after context created hooks (potentially when creating the catalog?)

I would like to see what I can do to improve load times or, at least figure out for sure what’s causing it. Any help would be nice (I cannot give access to the full project, but I will provide any info I can provide)

9 comments
M
M
J

Question about configuration. How nested can you organise your config environment? Is it possible to have something like the following file structure?

conf/
  base/
    crm/
      prm/
        parameters.yaml
        catalog.yaml
      feat/
        …
And still allow OmegaConf to read all files?

1 comment
d