Hey all, I'm running into a curious situation: When running a Kedro pipeline in Databricks, and saving the results to MLflow (through kedro_mlflow plugin), occasionally some parallel code will trigger a new run on the experiment. The biggest example is running hyperparameter optimization with Optuna, when using n_jobs=-1 for parallel execution, out of 100 trials maybe ~4 will randomly trigger a new MLFlow run inside the experiment (the other trials run normally without triggering new runs).
This is driving me nuts. Any guess on possible causes for it?
Found it! Databricks enables autologging, and all the parallel stuff must be causing a desync at some points. Possibly an mlflow bug? Either way, just need to disable it with a hook.
might be worth putting something like this hook in the databricks starter
class DisableMLFlowAutoLogger: @hook_impl(tryfirst=True) def after_context_created(self, context) -> None: mlflow.autolog(disable=True)
Is this auto logging enable by default? From the docs it seems to be something you need to enable
https://mlflow.org/docs/latest/tracking/autolog.html
It's enabled by default on databricks (check https://docs.databricks.com/en/mlflow/databricks-autologging.html )
Since most people run Kedro on databricks through the notebooks, this conflict might appear
Interesting, any idea why this only triggers random on a subset of parallel runs but not all?
No idea. But my guess is some desync or race condition in how mlflow deals with passing the run parameters across a cluster, I doubt the problem comes from Kedro
Can you open an issue in kedro-mlflow with your proposed solution ( a link to this conversation is enough)? I'm inclined to add it by default in the plugin.