Parallel execution triggering new mlflow runs in kedro pipeline

Question

Hey all, I'm running into a curious situation: When running a Kedro pipeline in Databricks, and saving the results to MLflow (through kedro_mlflow plugin), occasionally some parallel code will trigger a new run on the experiment. The biggest example is running hyperparameter optimization with Optuna, when using n_jobs=-1 for parallel execution, out of 100 trials maybe ~4 will randomly trigger a new MLFlow run inside the experiment (the other trials run normally without triggering new runs).

This is driving me nuts. Any guess on possible causes for it?

Diego Lira · Answer

Found it! Databricks enables autologging, and all the parallel stuff must be causing a desync at some points. Possibly an mlflow bug? Either way, just need to disable it with a hook.  might be worth putting something like this hook in the databricks starter class DisableMLFlowAutoLogger:    
    @hook_impl(tryfirst=True)
    def after_context_created(self, context) - >  None:    
        mlflow.autolog(disable=True)

Nok Lam Chan · Answer

Is this auto logging enable by default? From the docs it seems to be something you need to enable https://mlflow.org/docs/latest/tracking/autolog.html

Diego Lira · Answer

It's enabled by default on databricks (check https://docs.databricks.com/en/mlflow/databricks-autologging.html )

Since most people run Kedro on databricks through the notebooks, this conflict might appear

Nok Lam Chan · Answer

Interesting, any idea why this only triggers random on a subset of parallel runs but not all?

Diego Lira · Answer

No idea. But my guess is some desync or race condition in how mlflow deals with passing the run parameters across a cluster, I doubt the problem comes from Kedro

Yolan Honoré-Rougé · Answer

Can you open an issue in kedro-mlflow with your proposed solution ( a link to this conversation is enough)? I'm inclined to add it by default in the plugin.

Join the Kedro community

Parallel execution triggering new mlflow runs in kedro pipeline