Philipp Dahlke

Appending Rows to a CSV File with Datacatalog

Hey guys, I m having trouble to append a CSV with the datacatalog. My node is returning a DataFrame with one row and multiple metricnames as columns. It writes the results.csv to the folder accordingly but it doesnt append the rows. In addition, a blank row is created after the first row (might indicate the flaw? ) When I debugg step by step, both dataframes get written to the csv but are overwritten by each other.
Metric | Seed
--------|-------
1.0 | 42

results.update(
        {
            "seed": seed,
        }
    )
return = pd.DataFrame.from_dict([results])

My catalog has the save_arg mode set to "a"

"{engine}.{variant}.results":
  type: pandas.CSVDataset  # Underlying dataset type (CSV).
  filepath: data/08_reporting/{engine}/results.csv  # Path to the CSV file.
  save_args:
    mode: "a"  # Append mode for saving the CSV file.

7 comments

PPhilipp Dahlke

Solved

Trouble Running Kedro From Docker Build

Hi guys,

I am having trouble to run my kedro from a docker build. I'm using MLflow and the kedro_mlflow.io.artifacts.MlflowArtifactDataset

I followed the instructions for building the container from kedro-docker repo but when running, those artifacts want to access my local windows path instead of the containers path. Do you guys know what additional settings I have to make? All my settings in are pretty much vanilla. The mlflow_tracking_uri is set to null

"{dataset}.team_lexicon":
  type: kedro_mlflow.io.artifacts.MlflowArtifactDataset  
  dataset:
    type: pandas.ParquetDataset  
    filepath: data/03_primary/{dataset}/team_lexicon.pq 
    metadata:
      kedro-viz:
        layer: primary  
        preview_args:
            nrows: 5

Traceback (most recent call last):
  
kedro.io.core.DatasetError: Failed while saving data to dataset MlflowParquetDataset(filepath=/home/kedro_docker/data/03_primary/D1-24-25/team_lexicon.pq, load_args={}, protocol=file, save_args={}).
[Errno 13] Permission denied: '/C:'

3 comments

PPhilipp Dahlke

Kedro Mlflow NetCDF Dataset Path Issue

Hey Kedro community,
I'm currently working on a project trying to use kedro_mlfow to store kedro_datasets_experimental.netcdf as artifacts. Unfortunatly I can't make it work.

The problem seems to be path related:

kedro.io.core.DatasetError: 
Failed while saving data to dataset MlflowNetCDFDataset(filepath=S:/…/data/07_model_output/D2-24-25/idata.nc, load_args={'decode_times': False}, protocol=file, save_args={'mode': w}).
'str' object has no attribute 'as_posix'

I tried to investigate it to the best of my abilities and it seems to have to do with the initialization of NetCDFDataset. Most Datasets inherit from AbstractVersionedDataset and will call __init__ with its _filepath as str.
NetCDFDataset is missing it and so the PurePosixPath is not created. If this should be the problem in the end I don’t know but it is the point where other datasets have its path set. In the meantime I thought it might be because mlflow isn't capable of tracking Datasets which don't inherit from AbstractVersionedDataset but in kedro-mlfow documentation it says MlflowArtifactDataset is a wrapper for all AbstractDatasets.

I tried to set the self._filepath = PurePosixPath(filepath) myself in the sitepackage but getting a Permission error on saving and that’s were my journey has to end. Would have been too good if this oneline would have made it^^
Thank you guys for your help

here some reduced code for what I'm trying to achive.

catalog.yml

"{dataset}.idata":
  type: kedro_mlflow.io.artifacts.MlflowArtifactDataset
  dataset:
    type: kedro_datasets_experimental.netcdf.NetCDFDataset
    filepath: data/07_model_output/{dataset}/idata.nc
    save_args:
      mode: a
    load_args:
      decode_times: False

node.py

def predict(model, x_data):

    idata = model.predict(x_data)

    return az.convert_to_dataset(idata)

pipeline.py

pipeline_inference = pipeline(
            [
                node(
                    func=predict,
                    inputs={
                        "model": f"{dataset}.model",
                        "x_data": f"{dataset}.x_data",
                    },
                    outputs=f"{dataset}.idata",
                    name=f"{dataset}.predict_node",
                    tags=["training"],
                ),
            ]
        )

4 comments

Join the Kedro community

Appending Rows to a CSV File with Datacatalog

Trouble Running Kedro From Docker Build

Kedro Mlflow NetCDF Dataset Path Issue