Hello everyone!
I'm having some troubles using the geopandas.GenericDataset. Here is my dataset:
raw_line: type: geopandas.GenericDataset filepath: "data/01_raw/lines/lines.shp" file_format: file
DatasetError: Failed while loading data from dataset GenericDataset(file_format=file, filepath=C:/MyCodes/my_project/data/01_raw/lines/lines.shp, load_args={}, protocol=file, save_args={}). Failed to open dataset (flags=68): /vsimem/6485f3632b634505a3cf8c07708393b2
kedro==0.19.9 kedro-datasets==5.1.0 fiona==1.10.1 fsspec==2024.10.0 geopandas==1.0.1
Using a .zip with all the files works, but I wouldn't like to do that because I'm reading files that are written by other software.
Hi Julio,
If I understood correctly, the issue may be due to the set of files associated with the .shp
file. When using fsspec
, these files need to be packaged together in a .zip
. If that’s inconvenient and you don’t need fsspec
, you could create a simpler custom version of GenericDataset
without fsspec
. What do you think?
For the main version of the dataset, I believe we should keep fsspec
, even with these limitations.
although "data/01_raw/lines/lines.shp"
is a local path, right? 🤔 does it work if you do geopandas.read_file("data/01_raw/lines/lines.shp")
?
Hi, juanlu.
geopandas.read_file() works in a local path!
But my production files are saved in a remote data storage
Hi, Dmitry.
I still don't know if the issue is due to the set of files associated with the .shp
file.
My production files are saved in a remote data storage, but I can actually custom the dataset.
I mean the same as Juan Luis suggested - you can customise the dataset to remove fsspec
if you’re only working with local files. Just modify the dataset to use geopandas.read_file()
directly. However, you’ll need fsspec
if your files are stored remotely.
One strange point is:
If I save a .shp dataset using geopandas.GenericDataset, the same dataset is unable to read the file again.