Join the Kedro community

M
M
M
D
M

Running kedro-viz on docker without installing the library

hey everyone,

Is there a way to run kedro-viz on docker without actually installing the lib? I am asking because i wanted to keep the env a bit clean and I thought docker for viz would be nice. Did anyone do that before?

3
d
F
R
44 comments

hmm but i would still have to run build everytime i update my command no? I was asking for myself, I don't need to share with someone else. I would then put this into my projects docker compose file so that I can run kedro viz in a isolated docker image, not in my local env

I'm not sure if that's possible at the moment. Kedro-Viz works by reading the Kedro project and creating JSON endpoints, which the frontend uses for visualization. Even if you host the frontend in a Docker container, you'll still need Kedro-Viz as a library to convert the Kedro project into JSON files.

I can map my project files into the container so that's no problem.

One option is maybe you keep your project clean. and if your project is on github, you could use https://github.com/kedro-org/publish-kedro-viz -- this would do the kedro-viz installation on the Github Ci and host your kedro-viz on Github pages

I don't think there is an official way, but there's nothing to stop you from creating your own docker to run kedro-viz in a container.

To do that you will need both project and kedro-viz dependencies inside your docker.

It's also completely fine to run kedro-viz in a separate virtual env, more or less the same idea of Docker depends how much isolation you are looking for

thanks, github page also cool, might try that later. yes i will possibly do that, i was just wondering if anyone did that before but yeah i can write some docker config files to do that ๐Ÿ‘

if you get a nice solution working please share ๐Ÿ™‚ always keen to understand how best to do this

you may also be interested in kedro-viz --lite which was just shipped and builds the DAG through ast introspection without actually executing it , because you can now run kedro-viz without any of the actual dependencies (other than Kedro) installed

Will do that, thanks ๐Ÿ™‚

ideally, uvx --with kedro-viz kedro viz run --lite should work in <i>any</i> project. I just tested it.

Attachment
image.png

that's bonkers

there's a blog post in there

also since was asking specifically about Docker:

$ cat Dockerfile
FROM python:3.9-slim
RUN pip install uv && uv pip install --system kedro-viz kedro
EXPOSE 4141
WORKDIR /app
ENTRYPOINT ["kedro", "viz", "run", "--lite", "--host", "0.0.0.0"]
$ docker build -t kedro-viz-lite .
...
$ docker run -p 4141:4141 -v ~/Projects/demo:/app kedro-viz-lite

this works just fine ๐Ÿ™‚

Thanks for this, I've tried different combo and somehow i end up getting the following error:

File "/app/src/projx/models/text/base.py", line 34, in <module>
    class AnthropicAssistant(BaseMessage):
  File "/opt/conda/envs/py/lib/python3.10/dataclasses.py", line 1184, in dataclass
    return wrap(cls)
  File "/opt/conda/envs/py/lib/python3.10/dataclasses.py", line 1175, in wrap
    return _process_class(cls, init, repr, eq, order, unsafe_hash,
  File "/opt/conda/envs/py/lib/python3.10/dataclasses.py", line 908, in _process_class
    for b in cls.__mro__[-1:0:-1]:
  File "/opt/conda/envs/py/lib/python3.10/unittest/mock.py", line 643, in __getattr__
    raise AttributeError("Mock object has no attribute %r" % name)
AttributeError: Mock object has no attribute '__mro__'
For some reason it leads to dataclasses and errors out there. Code works fine normally, i am not sure why it does that. I thought it's uv stuff but replicating the same env in the container also results in the same error. I'll have a look later

๐Ÿ™ƒ this is our fault for sure, kedro viz --lite uses unittest Mock . cc

do you mind opening an issue on Kedro Viz about this?

Ahh, okay i was super confused ๐Ÿ˜„ I will open soon.

I tried without lite and still getting errors about my custom dataset definitions. I played with PYTHONPATH but no luck

Example: kedro.io.core.DatasetError: Class 'projx.models.audio.io.LargeModel' not found, is this a typo?

projx.models.audio.io.LargeModel
open a python terminal:
from projx.models.audio.io import LargeModel

What do you get?

that must be a separate error I'm sure. if python -c "from projx.models.audio.io import LargeModel" works but kedro run doesn't, then you have some problem with your installation

It was inside the docker, looks like some other deps was missing, when I run that in python i got the no module named elevenlabs so installing that fixed it. I wonder why that error is not thrown in kedro tho ๐Ÿค”

For some context on kedro viz --lite , it only mocks dependencies within your kedro project. It does not mock any transitive dependencies. For this - i got the no module named elevenlabs so installing that fixed it. I wonder why that error is not thrown in kedro tho do you mean kedro viz --lite did not raise an error or kedro run ?

Well actually both of them works now, missing dependency had some different error outputs. Not sure why

I do have a different problem now ๐Ÿ˜„

  viz:
    image: projx:viz
    build:
      context: .
      target: viz
    entrypoint: [ "bash" ]
    command:
      - -c
      - "kedro viz run --host 0.0.0.0"
    ports:
      - 4141:4141
    volumes:
      - ./:/app
This returns error as bash: line 1: kedro: command not found but when i comment the command section, then run kedro viz run inside the container it works. Does anyone have a clue? I feel like im missing something super obvious here ๐Ÿ˜…

I guess the bash file is run in a different env than the container command run

Yeah im also getting the error when i add the line to dockerfile ENTRYPOINT ["kedro", "viz", "run", "--lite", "--host", "0.0.0.0"] what's the best way to add kedro to containers bin path?

command:
    - -c
    - "source ~/.bashrc && kedro viz run --host 0.0.0.0"
Try if the above command works. If not, can you try installing kedro and kedro viz globally while creating the image ?

thanks that works ๐Ÿ˜„

what's the best way to add kedro to containers bin path?
You are not suppose to do that. Installing kedro-viz, should automatically add it to the python binary path already.

I think some python level stuff was on the bashrc so invoking that solved it. Now i see the kedro viz as expected. Thanks for the support ๐Ÿ˜„ ๐ŸŽ‰

Should we still open a issue about user level code errors being hidden in kedro? I feel this led to extra debugging sessions whereas it should have been clear from the beginning that a dependency was missing. Somehow error is being caught somewhere

I think some python level stuff was on the bashrc so invoking that solved it.
I think it's most likely the virtual env

Feel free to open an issue, ideally with something we can reproduce.

No I meant the earlier stack traces where No Module named elevenlabs was not thrown in kedro

We have been fighting this a bit to deal with two conflict requirements:

  • Kedro dynamic import dataset from different place - so it's tricky to figure out when ImportError happens intentionally or not
  • We want to surface the correct error

We partly fixed this before so it should be able to tell you whether it's missing dependencies, maybe there are something we can do more on the Kedro side

The issue i had was this:

Traceback (most recent call last):
  File "/opt/conda/envs/py/lib/python3.10/site-packages/kedro/io/core.py", line 159, in from_config
    class_obj, config = parse_dataset_definition(
  File "/opt/conda/envs/py/lib/python3.10/site-packages/kedro/io/core.py", line 501, in parse_dataset_definition
    raise DatasetError(
kedro.io.core.DatasetError: Class 'projx.models.audio.io.LargeModel' not found, is this a typo?
Hint: If you are trying to use a dataset from `kedro-datasets`, make sure that the package is installed in your current environment. You can do so by running `pip install kedro-datasets` or `pip install kedro-datasets[<dataset-group>]` to install `kedro-datasets` along with related dependencies for the specific dataset group.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/conda/envs/py/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/opt/conda/envs/py/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/envs/py/lib/python3.10/site-packages/kedro_viz/server.py", line 122, in run_server
    load_and_populate_data(
  File "/opt/conda/envs/py/lib/python3.10/site-packages/kedro_viz/server.py", line 59, in load_and_populate_data
    catalog, pipelines, session_store, stats_dict = kedro_data_loader.load_data(
  File "/opt/conda/envs/py/lib/python3.10/site-packages/kedro_viz/integrations/kedro/data_loader.py", line 172, in load_data
    return _load_data_helper(
  File "/opt/conda/envs/py/lib/python3.10/site-packages/kedro_viz/integrations/kedro/data_loader.py", line 101, in _load_data_helper
    catalog = context.catalog
  File "/opt/conda/envs/py/lib/python3.10/site-packages/kedro/framework/context/context.py", line 190, in catalog
    return self._get_catalog()
  File "/opt/conda/envs/py/lib/python3.10/site-packages/kedro/framework/context/context.py", line 234, in _get_catalog
    catalog: DataCatalog = settings.DATA_CATALOG_CLASS.from_config(
  File "/opt/conda/envs/py/lib/python3.10/site-packages/kedro/io/data_catalog.py", line 330, in from_config
    datasets[ds_name] = AbstractDataset.from_config(
  File "/opt/conda/envs/py/lib/python3.10/site-packages/kedro/io/core.py", line 163, in from_config
    raise DatasetError(
kedro.io.core.DatasetError: An exception occurred when parsing config for dataset 'narrator#lam':
Class 'projx.models.audio.io.LargeModel' not found, is this a typo?
Hint: If you are trying to use a dataset from `kedro-datasets`, make sure that the package is installed in your current environment. You can do so by running `pip install kedro-datasets` or `pip install kedro-datasets[<dataset-group>]` to install `kedro-datasets` along with related dependencies for the specific dataset group.
so this is what i saw and when i ran the import statement in the python as you suggested, i got this:

>>> from projx.models.audio.io import LargeModel
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/app/src/projx/models/audio/__init__.py", line 1, in <module>
    from .base import LAM
  File "/app/src/projx/models/audio/base.py", line 1, in <module>
    from elevenlabs.client import ElevenLabs
ModuleNotFoundError: No module named 'elevenlabs'
so I sort of was expecting kedro to show this in the first place. If there is an open issue about it happy to comment it there, otherwise i'd open a new one

so I sort of was expecting kedro to show this in the first place. If there is an open issue about it happy to comment it there, otherwise i'd open a new one

sorry for the delay. I think we've improved this recently. what version of Kedro is this?

0.19.7 is what i am using.

I was under the impression that we had fixed this as part of https://github.com/kedro-org/kedro/issues/2943, but maybe this is yet another case we have to handle?

kedro.io.core.DatasetError: An exception occurred when parsing config for dataset 'ingestion.int_typed_companies':
No module named 'pandas'. Please see the documentation on how to install relevant dependencies for kedro_datasets.pandas.ParquetDataset:
https://docs.kedro.org/en/stable/kedro_project_setup/dependencies.html#install-dependencies-related-to-the-data-catalog

This is what happened when I try to use pandas.CSVDataset when I have kedro-datasets installed but not pandas(pip uninstall on purpose)

Hmm, could it be realted to custom datasets created by the user? This example uses my custom defined dataset definition, perhaps there it doesnt work as expected?

could be, but I am not 100% sure here. The way kedro-dataset structure is usually having a dataset module and init, and implemnetation file.


  • some_dataset_module
  • init.py
  • some_dataset.py

We never directly import some_dataset, but usually through some_dataset_module with from .some_dataset import XYZDataset , and we also have lazy loading implemented, that could be another reason why we are able to catch error better for kedro-datasets

I have the following one:

dataset_name
- init.py -> from .base import LAM
- base.py -> Wrapper code around aPI endpoint and defines LAM
- io.py -> Reads kedro config and create the LAM instance.
So basically module init would try to load the package elevenlabs which is defined in the base and that's the error that was not caught.

For the context, I use the dataset itself to import LAM to do type declaration for nodes hence the separate io file and base file.

Add a reply
Sign up and join the conversation on Slack
Join