Hello there! I am trying to install kedro_datasets in order to use the pandas.ExcelDataset class to automate an xlsx loading process (not interested in the rest of the pipeline atm).
However, when i do pip install kedro_datasets
a) i see kedro being installed as a dependency, and,
b) i get an OSError saying "no directory: /bin/pygrun" (a dependency of antlr4-python3-runtime, which is a package for text processing)
>> Is it possible to restrict pip installation just to kedro_datasets, and (even stricter) just to a certain type (e.g. pandas or pandas.ExcelDataset) ?
(Googling didn't help with either a or b)
Running first pip install omegaconf
and following with pip instal kedro_datasets
helped resolve the installation error. Still though kedro
is being installed (is that expected?)
General thoughts:
It's best to have something like requirements.txt or pyproject.toml with all requirements specified in one place and then use tools like pip-tools
(pip-compile
) or Poetry
or uv
to resolve dependencies and versions - it will allow you to handle conflicts.
Installing packages one-by-one is a bad practice.
---
To address your specific question - pip install --no-deps <your>
should install only <your>
. Having in mind what I wrote above, I'm not recommending that path.
To install kedro_datasets with excel support you should go with:
pip install "kedro_datasets[pandas-exceldataset]"
and to clarify: yes, kedro
is a dependency of kedro-datasets
. there was some discussion about this exact topic https://github.com/kedro-org/kedro/issues/2409
and we've been collecting some evidence from users that want to install the Kedro Catalog without the rest of Kedro in https://github.com/kedro-org/kedro/issues/2741
may I ask, what motivates you to install kedro-datasets
without kedro
?
Belated reply, but thank you both for the tips and for the links! Interesting to read some of the historical conversations concerning choices when developing kedro.
While reading the discussion on the github issue, i find myself resonating with the point: "wanted to share with colleagues the Dataset abstraction, without requiring the entire kedro installation/project". Having spent a few days on my problem though, i see how my next steps are essentially converging to a pipeline which kedro itself can help out with. :)