Join the Kedro community

Updated 2 months ago

Troubleshooting Geopandas Genericdataset with Shapefile

Hello everyone!
I'm having some troubles using the geopandas.GenericDataset. Here is my dataset:

raw_line:
  type: geopandas.GenericDataset
  filepath: "data/01_raw/lines/lines.shp"
  file_format: file

I'm facing the error:
DatasetError: Failed while loading data from dataset GenericDataset(file_format=file,
filepath=C:/MyCodes/my_project/data/01_raw/lines/lines.shp, load_args={}, protocol=file, save_args={}).
Failed to open dataset (flags=68): /vsimem/6485f3632b634505a3cf8c07708393b2

It looks like there is an old issue related to fsspec + geopandas:
https://github.com/kedro-org/kedro/issues/695#issuecomment-973953139

My libs:
kedro==0.19.9
kedro-datasets==5.1.0
fiona==1.10.1
fsspec==2024.10.0
geopandas==1.0.1

Is anyone able to use geopandas.GenericDataset with .shp files?

J
D
J
7 comments

Using a .zip with all the files works, but I wouldn't like to do that because I'm reading files that are written by other software.

Hi Julio,
If I understood correctly, the issue may be due to the set of files associated with the .shp file. When using fsspec, these files need to be packaged together in a .zip. If that’s inconvenient and you don’t need fsspec, you could create a simpler custom version of GenericDataset without fsspec. What do you think?
For the main version of the dataset, I believe we should keep fsspec, even with these limitations.

although "data/01_raw/lines/lines.shp" is a local path, right? 🤔 does it work if you do geopandas.read_file("data/01_raw/lines/lines.shp")?

Hi, juanlu.
geopandas.read_file() works in a local path!
But my production files are saved in a remote data storage

Hi, Dmitry.
I still don't know if the issue is due to the set of files associated with the .shp file.
My production files are saved in a remote data storage, but I can actually custom the dataset.

I mean the same as Juan Luis suggested - you can customise the dataset to remove fsspec if you’re only working with local files. Just modify the dataset to use geopandas.read_file() directly. However, you’ll need fsspec if your files are stored remotely.

One strange point is:
If I save a .shp dataset using geopandas.GenericDataset, the same dataset is unable to read the file again.

Add a reply
Sign up and join the conversation on Slack