Join the Kedro community

Updated 2 hours ago

Parallel execution triggering new mlflow runs in kedro pipeline

Hey all, I'm running into a curious situation: When running a Kedro pipeline in Databricks, and saving the results to MLflow (through kedro_mlflow plugin), occasionally some parallel code will trigger a new run on the experiment. The biggest example is running hyperparameter optimization with Optuna, when using n_jobs=-1 for parallel execution, out of 100 trials maybe ~4 will randomly trigger a new MLFlow run inside the experiment (the other trials run normally without triggering new runs).

This is driving me nuts. Any guess on possible causes for it?

D
N
Y
6 comments

Found it! Databricks enables autologging, and all the parallel stuff must be causing a desync at some points. Possibly an mlflow bug? Either way, just need to disable it with a hook.
might be worth putting something like this hook in the databricks starter

class DisableMLFlowAutoLogger:    
    @hook_impl(tryfirst=True)
    def after_context_created(self, context) -> None:    
        mlflow.autolog(disable=True)

Is this auto logging enable by default? From the docs it seems to be something you need to enable


https://mlflow.org/docs/latest/tracking/autolog.html

It's enabled by default on databricks (check https://docs.databricks.com/en/mlflow/databricks-autologging.html )

Since most people run Kedro on databricks through the notebooks, this conflict might appear

Interesting, any idea why this only triggers random on a subset of parallel runs but not all?

No idea. But my guess is some desync or race condition in how mlflow deals with passing the run parameters across a cluster, I doubt the problem comes from Kedro

Can you open an issue in kedro-mlflow with your proposed solution ( a link to this conversation is enough)? I'm inclined to add it by default in the plugin.

Add a reply
Sign up and join the conversation on Slack