Join the Kedro community

Updated 4 months ago

Pip install kedro_datasets without installing dependencies

At a glance

Hello there! I am trying to install kedro_datasets in order to use the pandas.ExcelDataset class to automate an xlsx loading process (not interested in the rest of the pipeline atm).
However, when i do pip install kedro_datasets
a) i see kedro being installed as a dependency, and,
b) i get an OSError saying "no directory: /bin/pygrun" (a dependency of antlr4-python3-runtime, which is a package for text processing)
>> Is it possible to restrict pip installation just to kedro_datasets, and (even stricter) just to a certain type (e.g. pandas or pandas.ExcelDataset) ?
(Googling didn't help with either a or b)

G
m
J
6 comments

Running first pip install omegaconf and following with pip instal kedro_datasets helped resolve the installation error. Still though kedro is being installed (is that expected?)

General thoughts:
It's best to have something like requirements.txt or pyproject.toml with all requirements specified in one place and then use tools like pip-tools(pip-compile ) or Poetry or uv to resolve dependencies and versions - it will allow you to handle conflicts.

Installing packages one-by-one is a bad practice.

---

To address your specific question - pip install --no-deps <your> should install only <your>. Having in mind what I wrote above, I'm not recommending that path.

To install kedro_datasets with excel support you should go with:

pip install "kedro_datasets[pandas-exceldataset]"

and to clarify: yes, kedro is a dependency of kedro-datasets. there was some discussion about this exact topic https://github.com/kedro-org/kedro/issues/2409

and we've been collecting some evidence from users that want to install the Kedro Catalog without the rest of Kedro in https://github.com/kedro-org/kedro/issues/2741

may I ask, what motivates you to install kedro-datasets without kedro?

Belated reply, but thank you both for the tips and for the links! Interesting to read some of the historical conversations concerning choices when developing kedro.

While reading the discussion on the github issue, i find myself resonating with the point: "wanted to share with colleagues the Dataset abstraction, without requiring the entire kedro installation/project". Having spent a few days on my problem though, i see how my next steps are essentially converging to a pipeline which kedro itself can help out with. :)

Add a reply
Sign up and join the conversation on Slack