Hello, guys, I noticed that there is no support for log_table
method in kedro-mlflow. So I wonder what will be the right way to log additional data from a node, something that is not yet supported by the plugin?
Right now I just do something like this at the end of the node function
mlflow.log_table(data_for_table, output_filename)But I am concerned as I am not sure if it will always work and will always log the data to the correct run because I was not able to get retrieve the active run id from inside the node with
mlflow.active_run()
(it returns None
all the time).Evaluation
tab in the UI to manually compare some outputs of different runs.You can just just return your table at the end of the node, and use a MlflowArtifactDataset combined with a CSVDataset in your catalog
It won't work. I mean it will log the artifact for sure but it will not be accessible in Evaluation
tab.
As far as I understand it should be logged via mlflow.log_table
method to appear in the datasets available for the Evaluation
tab
But you are right , there no support for log table right now. Please open an issue in the repo and I'll try to add it : https://github.com/Galileo-Galilei/kedro-mlflow
Even if you use a JSON Dataset instead of a CSV one?
pandas.JSONDataset
(because I have data in DataFrame) with MlflowArtifactDataset
and it produced some stringified JSON as a result so it was not available in Evaluation either. Could you recommend which of the JSON datasets to try?Actually, maybe the JSON wasn't stringified. It might have had a different format because MLflow uses something like:
{ "columns": list[column_names], "data": list[list[values]] }whereas pandas converts a DataFrame into this format:
{ "[column_name]": list[values] }I can't check right now, but I'm almost sure this was the problem. So, one way to address it would be to manually convert a DataFrame into MLflowâs JSON format and then save it as you advised.
I think it should. If I remember correctly there is a df.to_json(orient=...)
argument to specify how the conversion should be done
@Yolan Honoré-Rougé FYI, I've created a feature request https://github.com/Galileo-Galilei/kedro-mlflow/issues/634
How about using a hook with the mlflow library? Thats what I do atm.
You will have access to the current run which the MlflowHook
from kedro-mlfow, instatiated via mlflow.active_run
and are able to retrieve the node outputs with the after_node_run
method kedro provides.
@Philipp Dahlke Yeah, thank you, I think it makes sense too.
I just thought that it is probably and overkill for now since the simple call to mlflow.log_table
does the trick.
I just don't like this as a long term solution. So if by the time I have problems with it there will be no update in the plugin I will probably use a Hook or some other workaround