Hey all! I'm working on tooling around running Kedro pipelines in our (pre-existing) Prefect deployment. I've been following the lead of the example from the docs and things were going pretty smoothly until I came around to logging. Logging in Prefect is a little finicky, but what I'd like to do is route the Kedro logs through to the Prefect loggers and handlers. Happy to go into more detail about what I've tried, but figured I'd first ask if anyone has experience here? Is there some other way to handle exposing Kedro logs to in the Prefect UI (which is ultimately my goal).
Prior to using Kedro (currently working on the first "real" pilot), our pattern would be to grab the logger via Prefect's get_logger methods and then inject that down into our pipeline code.
more context: I've implemented two custom datasets and it's the logs from that code that I want expose via the prefect logger
the closest I've gotten was defining a custom handler for the kedro logger, and having that code instantiate the prefect run logger and pass the logs through, but this wasn't capturing all of the logs from the datasets.
i sort of suspect the issue was conflicting logging configuration between kedro and prefect, but haven't gone down that rabbit hole yet
or maybe something wrong with how I configured this, but it felt like a too deep a rabbit hole to go down before asking y'all for help 😅
I also didn't pursue that farther because I switched to trying to get Prefect to pick up the kedro logs using https://docs.prefect.io/v3/develop/logging#include-logs-from-other-libraries but I haven't been able to get that to work either (no matter how I try to configure this I don't see the kedro logs in the UI)
I found the same thing last time I tried running Kedro in Prefect. I tweaked the logging a bit, does this help? https://github.com/astrojuanlu/workshop-from-zero-to-mlops/blob/main/conf/logging_prefect.yml
I see, this is much simpler than what I did, just tell the kedro logger to use the prefect handler. it isn't resolving the issue, but it does give a clue:
[02/20/25 16:34:16] WARNING /usr/local/lib/python3.12/site-packages/kedro/framework/project/__init__.py:270: UserWarning: Logger 'kedro.framework.project' warnings.py:110 attempted to send logs to the API without a flow run id. The API log handler can only send logs within flow run contexts unless the flow run id is manually provided.my handler was eating these warning :facepalming: .... I think this means that the logs that I'm looking for are being generated outside a prefect flow/task
in order to get the logs to the UI (which is not where the actual tasks are run) the logs are sent through an API
and in order for that to work properly the prefect loggers need to be used so that the appropriate metadata (task run id, etc...) can be attached
but after digging deeper, something is happening down inside Kedro that is OUTSIDE the Prefect context
hard to say exactly what without knowing/learning more about the implementation details of Kedro
it looks like the actual execution of the node implementations are not inheriting the prefect context
but as far as I can tell there are places in kedro (the load
and save
methods of a Dataset being an example) where there's no practical way of getting logs to the prefect UI
because the execution of that code is happening outside the prefect flow/task context and you can't call get_run_runner
It may be possible to manually track the necessary metatdata (task run ids, etc...) and then manually inject that before calling the prefect api log handler
but i am very hesitant to go down that road because it feels like it would be very brittle (ie, if prefect changes internal implementations of the logging api, this code will break)
in our particular case we have things setup so that all logs from any container in our cluster get shipped to logzio, so I think we'll have to live with a world where we have to go there to see all the logs (as opposed to the prefect UI) which is not ideal, but fine
relevant thread https://linen.prefect.io/t/18810089/how-should-i-handle-logging-in-a-prefect-compatible-way-in-a
i think the tldr for this thread is that:
1. it appears kedro is not preserving context when doing threading or something similar
2. maybe this could be a nice contribution? attempt to, in general, preserve any active context through to all parts of the pipeline execution (assuming this is even possible/feasible)
(I say I think because I would still consider everything I've said in this thread conjecture)
i'm going to dig a little deeper and see if this really does come down to how kedro is handling Python Context variables.
please keep us posted with whatever you find 🙏 I have no idea if Kedro is doing anything funky here
i sort of suspect it's more like kedro isn't looking for and passing through context variables rather than kedro doing something funky
if that's that case, it's a totally reasonable decision to not worry about it without some specific need for it
confirmed that Prefect is using context variables to store the flow/task context information https://github.com/PrefectHQ/prefect/blob/main/src/prefect/context.py#L129
well, at this point this is more like a github issue than a slack thread, so i'm going to collect notes and if it seems like something worth considering i'll open an issue
ok one more post to close the loop.... I think this may be a fast and good enhancement after all
it looks like Kedro is using concurrent futures to actually run things https://github.com/kedro-org/kedro/blob/main/kedro/runner/runner.py#L243
which means adding a context = contextvars.copy_context()
and passing that into submit
may do the trick
There are some changes around this, so be careful of which version of Kedro that you are using. This is added only in the most recent release (or the one before)