Join the Kedro community

Home
Members
Galen Seilis
G
Galen Seilis
Offline, last seen 6 days ago
Joined September 19, 2024

This is an open question to anyone who has experience using Kedro. How do you think about what should go into a single Kedro project vs what should be split into multiple Kedro projects.

For some point of comparison, here has been my thinking. These are not universal rules, but are rather tendencies I have noticed.

  • A Kedro project may be coupled to a business project. In which case, all artifacts pertaining to that project are defined and produced within it.
  • For hobby projects I tend to make them as small as possible; targeting a small set of questions to answer (like half-a-dozen or less).

How do you think about the scope of a Kedro project?

5 comments
G
Y
E

I am looking at this example: https://docs.kedro.org/en/stable/development/commands_reference.html#customise-or-override-project-specific-kedro-commands

"""Command line tools for manipulating a Kedro project.
Intended to be invoked via `kedro`."""
import click
from kedro.framework.cli.project import (
    ASYNC_ARG_HELP,
    CONFIG_FILE_HELP,
    CONF_SOURCE_HELP,
    FROM_INPUTS_HELP,
    FROM_NODES_HELP,
    LOAD_VERSION_HELP,
    NODE_ARG_HELP,
    PARAMS_ARG_HELP,
    PIPELINE_ARG_HELP,
    RUNNER_ARG_HELP,
    TAG_ARG_HELP,
    TO_NODES_HELP,
    TO_OUTPUTS_HELP,
)
from kedro.framework.cli.utils import (
    CONTEXT_SETTINGS,
    _config_file_callback,
    _split_params,
    _split_load_versions,
    env_option,
    split_string,
    split_node_names,
)
from kedro.framework.session import KedroSession
from kedro.utils import load_obj


@click.group(context_settings=CONTEXT_SETTINGS, name=__file__)
def cli():
    """Command line tools for manipulating a Kedro project."""


@cli.command()
@click.option(
    "--from-inputs", type=str, default="", help=FROM_INPUTS_HELP, callback=split_string
)
@click.option(
    "--to-outputs", type=str, default="", help=TO_OUTPUTS_HELP, callback=split_string
)
@click.option(
    "--from-nodes", type=str, default="", help=FROM_NODES_HELP, callback=split_node_names
)
@click.option(
    "--to-nodes", type=str, default="", help=TO_NODES_HELP, callback=split_node_names
)
@click.option("--nodes", "-n", "node_names", type=str, multiple=True, help=NODE_ARG_HELP)
@click.option(
    "--runner", "-r", type=str, default=None, multiple=False, help=RUNNER_ARG_HELP
)
@click.option("--async", "is_async", is_flag=True, multiple=False, help=ASYNC_ARG_HELP)
@env_option
@click.option("--tags", "-t", type=str, multiple=True, help=TAG_ARG_HELP)
@click.option(
    "--load-versions",
    "-lv",
    type=str,
    multiple=True,
    help=LOAD_VERSION_HELP,
    callback=_split_load_versions,
)
@click.option("--pipeline", "-p", type=str, default=None, help=PIPELINE_ARG_HELP)
@click.option(
    "--config",
    "-c",
    type=click.Path(exists=True, dir_okay=False, resolve_path=True),
    help=CONFIG_FILE_HELP,
    callback=_config_file_callback,
)
@click.option(
    "--conf-source",
    type=click.Path(exists=True, file_okay=False, resolve_path=True),
    help=CONF_SOURCE_HELP,
)
@click.option(
    "--params",
    type=click.UNPROCESSED,
    default="",
    help=PARAMS_ARG_HELP,
    callback=_split_params,
)
def run(
    tags,
    env,
    runner,
    is_async,
    node_names,
    to_nodes,
    from_nodes,
    from_inputs,
    to_outputs,
    load_versions,
    pipeline,
    config,
    conf_source,
    params,
):
    """Run the pipeline."""

    runner = load_obj(runner or "SequentialRunner", "kedro.runner")
    tags = tuple(tags)
    node_names = tuple(node_names)

    with KedroSession.create(
        env=env, conf_source=conf_source, extra_params=params
    ) as session:
        session.run(
            tags=tags,
            runner=runner(is_async=is_async),
            node_names=node_names,
            from_nodes=from_nodes,
            to_nodes=to_nodes,
            from_inputs=from_inputs,
            to_outputs=to_outputs,
            load_versions=load_versions,
            pipeline_name=pipeline,
        )

Generally in Python users are supposed to stay away from single underscore prefixed variables, however this example in the docs illustrates using them. When functions like _config_file_callback appear in user documentation for constructing examples it makes it less clear what is intended for end-users.

Are such methods / functions supposed to be part of the public API?

3 comments
D
N

In this example: https://docs.kedro.org/en/stable/extend_kedro/plugins.html#project-context

I see that _get_project_metadata does not get called. Is it relevant to this example?

from pathlib import Path

from kedro.framework.startup import _get_project_metadata
from kedro.framework.session import KedroSession


project_path = Path.cwd()
session = KedroSession.create(project_path=project_path)
context = session.load_context()

Am I assuming that project_path would get defined elsewhere?

1 comment
D

How does the Kedro dev team think about delineating what components belong to the public API vs being internal-use only?

I see single leading underscores _<foo> are used, which I assume means they belong to the private API.

'Sometimes' I see <i><code>__all__</code></i> is used. Are things in that list safe to assume as part of the public API?

If a variable (function/method/class/etc) does not have a leading underscore, and is not in a __<i><code>all_</code></i>_ , does that mean it is safe to assume it is also part of the public API?

4 comments
N
D
G

I see that kedro-lsp is on PyPi, but I guess the repository was deleted on Github. Did anybody replace any of that functionality for Neovim?
https://pypi.org/project/kedro-lsp/

12 comments
J
N
G
R

Anybody know how to integrate Kedro with Microsoft Fabric (MSF)? It would be nice to pull in data from it into a local Kedro project. AFAIK the SemPy package only works from within MSF.

7 comments
J
G
M

I have a question about the memory dataset's default copy method. I noticed that if the data is a pandas dataframe or a numpy array that copy rather than assignment (i.e. making a reference) is used by default. I'm wondering what the rationale for that is. Often making a reference is cheaper in terms of runtime than making either a shallow or deep copy. Why is assignment not the top priority default?

https://docs.kedro.org/en/stable/_modules/kedro/io/memory_dataset.html#MemoryDataset

8 comments
D
G
Y

I am getting a warning when I run pytest on a dummy project:

PytestDeprecationWarning: The hookimpl CovPlugin.pytest_configure_node uses old-style configuration options (marks or attributes).
  Please use the pytest.hookimpl(optionalhook=True) decorator instead
   to configure the hooks.
   See <a target="_blank" rel="noopener noreferrer" href="https://docs.pytest.org/en/latest/deprecations.html#configuring-hook-specs-impls-using-markers">https://docs.pytest.org/en/latest/deprecations.html#configuring-hook-specs-impls-using-markers</a>
    def pytest_configure_node(self, node):

..\venv\lib\site-packages\pytest_cov\plugin.py:265
\venv\lib\site-packages\pytest_cov\plugin.py:265: PytestDeprecationWarning: The hookimpl CovPlugin.pytest_testnodedown uses old-style configuration options (marks or attributes).
  Please use the pytest.hookimpl(optionalhook=True) decorator instead
   to configure the hooks.
   See <a target="_blank" rel="noopener noreferrer" href="https://docs.pytest.org/en/latest/deprecations.html#configuring-hook-specs-impls-using-markers">https://docs.pytest.org/en/latest/deprecations.html#configuring-hook-specs-impls-using-markers</a>
    def pytest_testnodedown(self, node, error):

-- Docs: <a target="_blank" rel="noopener noreferrer" href="https://docs.pytest.org/en/stable/how-to/capture-warnings.html">https://docs.pytest.org/en/stable/how-to/capture-warnings.html</a>

Is this something I should do anything about, or will it be addressed in a future version of Kedro?

6 comments
d
N
G

I am getting some kind of environmental variable or config issue. The module for the project cannot be found. At first I thought it was just one project, but it seems to be something broader.

On my system even creating a fresh project gives the same error.

  1. I create a venv, and activate it.
  2. Install kedro (pip install kedro), which is currently giving 0.19.9
  3. Initialize new kedro project.
  4. Run pytest (which gives a module not found error for the very project I just created).

Troubleshooting advice would be appreciated.

19 comments
J
G
N

Is there a recommended way to run type checkers (e.g. MyPy) on Kedro projects?

2 comments
D
M