Running Kedro Pipelines in a Prefect Deployment

Question

Hey all! I'm working on tooling around running Kedro pipelines in our (pre-existing) Prefect deployment. I've been following the lead of  the example from the docs  and things were going pretty smoothly until I came around to logging. Logging in Prefect is a little finicky, but what I'd like to do is route the Kedro logs through to the Prefect loggers and handlers. Happy to go into more detail about what I've tried, but figured I'd first ask if anyone has experience here? Is there some other way to handle exposing Kedro logs to in the Prefect UI (which is ultimately my goal).

Matt Mastin · Answer

Prior to using Kedro (currently working on the first "real" pilot), our pattern would be to grab the logger via Prefect's get_logger methods and then inject that down into our pipeline code.

Matt Mastin · Answer

But, this doesn't really seem like what we'd want with Kedro (at least I don't think).

Matt Mastin · Answer

Am I thinking about this right?

Matt Mastin · Answer

more context: I've implemented two custom datasets and it's the logs from that code that I want expose via the prefect logger

Matt Mastin · Answer

the closest I've gotten was defining a custom handler for the kedro logger, and having that code instantiate the prefect run logger and pass the logs through, but this wasn't capturing all of the logs from the datasets.

Matt Mastin · Answer

i sort of suspect the issue was conflicting logging configuration between kedro and prefect, but haven't gone down that rabbit hole yet

Matt Mastin · Answer

or maybe something wrong with how I configured this, but it felt like a too deep a rabbit hole to go down before asking y'all for help 😅

Matt Mastin · Answer

I also didn't pursue that farther because I switched to trying to get Prefect to pick up the kedro logs using  https://docs.prefect.io/v3/develop/logging#include-logs-from-other-libraries  but I haven't been able to get that to work either (no matter how I try to configure this I don't see the kedro logs in the UI)

Juan Luis Cano Rodríguez · Answer

I found the same thing last time I tried running Kedro in Prefect. I tweaked the logging a bit, does this help?  https://github.com/astrojuanlu/workshop-from-zero-to-mlops/blob/main/conf/logging_prefect.yml

Matt Mastin · Answer

I see, this is much simpler than what I did, just tell the kedro logger to use the prefect handler. it isn't resolving the issue, but it does give a clue: [02/20/25 16:34:16] WARNING  /usr/local/lib/python3.12/site-packages/kedro/framework/project/__init__.py:270: UserWarning: Logger 'kedro.framework.project' warnings.py:110
                             attempted to send logs to the API without a flow run id. The API log handler can only send logs within flow run contexts
                             unless the flow run id is manually provided. my handler was eating these warning :facepalming: .... I think this means that the logs that I'm looking for are being generated  outside  a prefect flow/task

Matt Mastin · Answer

well, it definitely means this

Matt Mastin · Answer

i'm just surprised by it

Juan Luis Cano Rodríguez · Answer

yeah I don’t know why logging in Prefect is so special

Nok Lam Chan · Answer

Probably due to the distributed nature? not so sure tho

Matt Mastin · Answer

yeah I think that's right, but it does feel a bit over engineered

Matt Mastin · Answer

in order to get the logs to the UI (which is not where the actual tasks are run) the logs are sent through an API

Matt Mastin · Answer

and in order for that to work properly the prefect loggers need to be used so that the appropriate metadata (task run id, etc...) can be attached

Matt Mastin · Answer

but after digging deeper, something is happening down inside Kedro that is OUTSIDE the Prefect context

Matt Mastin · Answer

hard to say exactly what without knowing/learning more about the implementation details of Kedro

Matt Mastin · Answer

it looks like the actual execution of the node implementations are not inheriting the prefect context

Matt Mastin · Answer

i assume because of async things or threading things or something

Matt Mastin · Answer

but as far as I can tell there are places in kedro (the  load  and  save  methods of a Dataset being an example) where there's no practical way of getting logs to the prefect UI

Matt Mastin · Answer

because the execution of that code is happening outside the prefect flow/task context and you can't call  get_run_runner

Matt Mastin · Answer

It may be possible to manually track the necessary metatdata (task run ids, etc...) and then manually inject that before calling the prefect api log handler

Matt Mastin · Answer

but i am very hesitant to go down that road because it feels like it would be very brittle (ie, if prefect changes internal implementations of the logging api, this code will break)

Matt Mastin · Answer

in our particular case we have things setup so that all logs from any container in our cluster get shipped to logzio, so I think we'll have to live with a world where we have to go there to see all the logs (as opposed to the prefect UI) which is not ideal, but fine

Matt Mastin · Answer

relevant thread  https://linen.prefect.io/t/18810089/how-should-i-handle-logging-in-a-prefect-compatible-way-in-a

Matt Mastin · Answer

i think the tldr for this thread is that:
1. it appears kedro is not preserving context when doing threading or something similar
2. maybe this could be a nice contribution? attempt to, in general, preserve any active context through to all parts of the pipeline execution (assuming this is even possible/feasible)

Juan Luis Cano Rodríguez · Answer

what does "context" mean in this... ehm... context? 😬

Matt Mastin · Answer

I think I mean "context" in the python context management sense

Matt Mastin · Answer

but a very fair question 😆

Matt Mastin · Answer

https://docs.python.org/3/library/contextvars.html

Matt Mastin · Answer

(I say I think because I would still consider everything I've said in this thread conjecture)

Matt Mastin · Answer

and I also don't REALLY know the details of what kedro is doing

Matt Mastin · Answer

lol. or what prefect is really doing i guess

Matt Mastin · Answer

ie, it may actually be some prefect specific notion of context

Matt Mastin · Answer

i'm going to dig a little deeper and see if this really does come down to how kedro is handling Python Context variables.

Juan Luis Cano Rodríguez · Answer

please keep us posted with whatever you find 🙏 I have no idea if Kedro is doing anything funky here

Matt Mastin · Answer

i sort of suspect it's more like kedro isn't looking for and passing through context variables rather than kedro doing something funky

Matt Mastin · Answer

if that's that case, it's a totally reasonable decision to not worry about it without some specific need for it

Matt Mastin · Answer

which this may be

Matt Mastin · Answer

this = handling things like prefect logging

Matt Mastin · Answer

confirmed that Prefect is using context variables to store the flow/task context information  https://github.com/PrefectHQ/prefect/blob/main/src/prefect/context.py#L129

Matt Mastin · Answer

well, at this point this is more like a github issue than a slack thread, so i'm going to collect notes and if it seems like something worth considering i'll open an issue

Matt Mastin · Answer

ok one more post to close the loop.... I think this may be a fast and good enhancement after all

Matt Mastin · Answer

it looks like Kedro is using concurrent futures to actually run things  https://github.com/kedro-org/kedro/blob/main/kedro/runner/runner.py#L243

Matt Mastin · Answer

which means adding a  context = contextvars.copy_context()  and passing that into  submit  may do the trick

Matt Mastin · Answer

going to try this and if it works i'll open an issue

Nok Lam Chan · Answer

There are some changes around this, so be careful of which version of Kedro that you are using. This is added only in the most recent release (or the one before)

Join the Kedro community

Running Kedro Pipelines in a Prefect Deployment