Join the Kedro community

Home
Members
Philipp Dahlke
P
Philipp Dahlke
Offline, last seen 3 days ago
Joined January 10, 2025

Hey Kedro community,
I'm currently working on a project trying to use kedro_mlfow to store kedro_datasets_experimental.netcdf as artifacts. Unfortunatly I can't make it work.

The problem seems to be path related:

kedro.io.core.DatasetError: 
Failed while saving data to dataset MlflowNetCDFDataset(filepath=S:/…/data/07_model_output/D2-24-25/idata.nc, load_args={'decode_times': False}, protocol=file, save_args={'mode': w}).
'str' object has no attribute 'as_posix'
I tried to investigate it to the best of my abilities and it seems to have to do with the initialization of NetCDFDataset. Most Datasets inherit from AbstractVersionedDataset and will call __init__ with its _filepath as str.
NetCDFDataset is missing it and so the PurePosixPath is not created. If this should be the problem in the end I don’t know but it is the point where other datasets have its path set. In the meantime I thought it might be because mlflow isn't capable of tracking Datasets which don't inherit from AbstractVersionedDataset but in kedro-mlfow documentation it says MlflowArtifactDataset is a wrapper for all AbstractDatasets.

I tried to set the self._filepath = PurePosixPath(filepath) myself in the sitepackage but getting a Permission error on saving and that’s were my journey has to end. Would have been too good if this oneline would have made it^^
Thank you guys for your help

here some reduced code for what I'm trying to achive.

catalog.yml
"{dataset}.idata":
  type: kedro_mlflow.io.artifacts.MlflowArtifactDataset
  dataset:
    type: kedro_datasets_experimental.netcdf.NetCDFDataset
    filepath: data/07_model_output/{dataset}/idata.nc
    save_args:
      mode: a
    load_args:
      decode_times: False
node.py
def predict(model, x_data):

    idata = model.predict(x_data)

    return az.convert_to_dataset(idata)
pipeline.py
pipeline_inference = pipeline(
            [
                node(
                    func=predict,
                    inputs={
                        "model": f"{dataset}.model",
                        "x_data": f"{dataset}.x_data",
                    },
                    outputs=f"{dataset}.idata",
                    name=f"{dataset}.predict_node",
                    tags=["training"],
                ),
            ]
        )

4 comments
J
Y
R
P