Pip install kedro_datasets without installing dependenc...

At a glance

The community member is trying to install kedro_datasets to use the pandas.ExcelDataset class, but is encountering issues. When installing kedro_datasets, kedro is also being installed as a dependency, and the community member is getting an OSError related to antlr4-python3-runtime.

The community members suggest using tools like requirements.txt or pyproject.toml to manage dependencies, and avoid installing packages one-by-one. To install kedro_datasets with Excel support, the recommended command is pip install "kedro_datasets[pandas-exceldataset]".

The community members also clarify that kedro is a dependency of kedro-datasets, and there has been some discussion around the desire to install the Kedro Catalog without the entire Kedro framework.

Useful resources

GGeorge P.

Hello there! I am trying to install kedro_datasets in order to use the pandas.ExcelDataset class to automate an xlsx loading process (not interested in the rest of the pipeline atm).
However, when i do pip install kedro_datasets
a) i see kedro being installed as a dependency, and,
b) i get an OSError saying "no directory: /bin/pygrun" (a dependency of antlr4-python3-runtime, which is a package for text processing)
>> Is it possible to restrict pip installation just to kedro_datasets, and (even stricter) just to a certain type (e.g. pandas or pandas.ExcelDataset) ?
(Googling didn't help with either a or b)

6 comments

GGeorge P.

Running first pip install omegaconf and following with pip instal kedro_datasets helped resolve the installation error. Still though kedro is being installed (is that expected?)

mmarrrcin

General thoughts:
It's best to have something like requirements.txt or pyproject.toml with all requirements specified in one place and then use tools like pip-tools(pip-compile ) or Poetry or uv to resolve dependencies and versions - it will allow you to handle conflicts.

Installing packages one-by-one is a bad practice.

---

To address your specific question - pip install --no-deps <your> should install only <your>. Having in mind what I wrote above, I'm not recommending that path.

mmarrrcin

To install kedro_datasets with excel support you should go with:

pip install "kedro_datasets[pandas-exceldataset]"

JJuan Luis Cano Rodríguez

and to clarify: yes, kedro is a dependency of kedro-datasets. there was some discussion about this exact topic https://github.com/kedro-org/kedro/issues/2409

and we've been collecting some evidence from users that want to install the Kedro Catalog without the rest of Kedro in https://github.com/kedro-org/kedro/issues/2741

may I ask, what motivates you to install kedro-datasets without kedro?

GGeorge P.

Belated reply, but thank you both for the tips and for the links! Interesting to read some of the historical conversations concerning choices when developing kedro.

GGeorge P.

While reading the discussion on the github issue, i find myself resonating with the point: "wanted to share with colleagues the Dataset abstraction, without requiring the entire kedro installation/project". Having spent a few days on my problem though, i see how my next steps are essentially converging to a pipeline which kedro itself can help out with. :)

Add a reply

Join the Kedro community

Pip install kedro_datasets without installing dependencies