Hey guys, I m having trouble to append a CSV
with the datacatalog. My node is returning a DataFrame
with one row and multiple metricnames as columns. It writes the results.csv to the folder accordingly but it doesnt append the rows. In addition, a blank row is created after the first row (might indicate the flaw? ) When I debugg step by step, both dataframes get written to the csv but are overwritten by each other.
Metric | Seed
--------|-------
1.0 | 42
results.update( { "seed": seed, } ) return = pd.DataFrame.from_dict([results])
"{engine}.{variant}.results": type: pandas.CSVDataset # Underlying dataset type (CSV). filepath: data/08_reporting/{engine}/results.csv # Path to the CSV file. save_args: mode: "a" # Append mode for saving the CSV file.
Hi guys,
I am having trouble to run my kedro from a docker build. I'm using MLflow and the kedro_mlflow.io.artifacts.MlflowArtifactDataset
I followed the instructions for building the container from kedro-docker repo but when running, those artifacts want to access my local windows path instead of the containers path. Do you guys know what additional settings I have to make? All my settings in are pretty much vanilla. The mlflow_tracking_uri
is set to null
"{dataset}.team_lexicon": type: kedro_mlflow.io.artifacts.MlflowArtifactDataset dataset: type: pandas.ParquetDataset filepath: data/03_primary/{dataset}/team_lexicon.pq metadata: kedro-viz: layer: primary preview_args: nrows: 5
Traceback (most recent call last): kedro.io.core.DatasetError: Failed while saving data to dataset MlflowParquetDataset(filepath=/home/kedro_docker/data/03_primary/D1-24-25/team_lexicon.pq, load_args={}, protocol=file, save_args={}). [Errno 13] Permission denied: '/C:'
Hey Kedro community,
I'm currently working on a project trying to use kedro_mlfow
to store kedro_datasets_experimental.netcdf
as artifacts. Unfortunatly I can't make it work.
The problem seems to be path related:
kedro.io.core.DatasetError: Failed while saving data to dataset MlflowNetCDFDataset(filepath=S:/…/data/07_model_output/D2-24-25/idata.nc, load_args={'decode_times': False}, protocol=file, save_args={'mode': w}). 'str' object has no attribute 'as_posix'I tried to investigate it to the best of my abilities and it seems to have to do with the initialization of
NetCDFDataset
. Most Datasets inherit from AbstractVersionedDataset
and will call __init__
with its _filepath as str.NetCDFDataset
is missing it and so the PurePosixPath
is not created. If this should be the problem in the end I don’t know but it is the point where other datasets have its path set. In the meantime I thought it might be because mlflow isn't capable of tracking Datasets which don't inherit from AbstractVersionedDataset
but in kedro-mlfow documentation it says MlflowArtifactDataset
is a wrapper for all AbstractDatasets
.self._filepath = PurePosixPath(filepath)
myself in the sitepackage but getting a Permission error on saving and that’s were my journey has to end. Would have been too good if this oneline would have made it^^"{dataset}.idata": type: kedro_mlflow.io.artifacts.MlflowArtifactDataset dataset: type: kedro_datasets_experimental.netcdf.NetCDFDataset filepath: data/07_model_output/{dataset}/idata.nc save_args: mode: a load_args: decode_times: Falsenode.py
def predict(model, x_data): idata = model.predict(x_data) return az.convert_to_dataset(idata)pipeline.py
pipeline_inference = pipeline( [ node( func=predict, inputs={ "model": f"{dataset}.model", "x_data": f"{dataset}.x_data", }, outputs=f"{dataset}.idata", name=f"{dataset}.predict_node", tags=["training"], ), ] )