Regarding https://github.com/kedro-org/kedro/issues/4322, I am working on upgrading a big project from kedro 0.18.13 to the latest version. While doing so, I am also removing a custom ConfigLoader as I want to use OmegaConf. However, I do see some performance issues here too compared to the custom implementation we had. Did some debugging (using logging in my hooks) and found the following:
Hi Matthias, thanks for sharing your observations. We did some more in depth analysis as well: https://github.com/kedro-org/kedro/issues/3893
The verdict is that most of the slowness comes from the omegaconf
side. But and are working on improving what we can on our side.
Would be great to hear your ideas if you have any!
on top of what said, are you able to do line_profiler
+ cProfile
reports on your code and let us know where the hotspots are?
I did a deep dive on what’s making loading the catalog slow for me. I only load from the base env which already contains 63 catalog files (with 1500 entries each). It seems the bottleneck is in the return statement when loading and merging configs. More specifically, the to_container
with resolving the config.
Thanks , this seems to match our initial investigation on https://github.com/kedro-org/kedro/issues/3893
our intended solution is still "Reduce the time spent [...] on OmegaConf.to
_container
"
I can already say it does scale linearly on the number of entries. For me, it’s 44ms per catalog entry (x1500 -> 60sec)
Two small changes I immediately see is
_
and then do the resolvingWhat could be a huge improvement, at least in my case, is to keep using OmegaConf objects in the rest of the kedro project (as opposed to dicts). This will probably be a major backend change but you would then postpone to_container
calls as long as possible (and in our case skipping many as we only use a portion of the catalog on every kedro run)
@Yolan Honoré-Rougé proposed exactly this at https://github.com/kedro-org/kedro/issues/2973