Galen Seilis

Kedro project structure considerations

This is an open question to anyone who has experience using Kedro. How do you think about what should go into a single Kedro project vs what should be split into multiple Kedro projects.

For some point of comparison, here has been my thinking. These are not universal rules, but are rather tendencies I have noticed.

A Kedro project may be coupled to a business project. In which case, all artifacts pertaining to that project are defined and produced within it.
For hobby projects I tend to make them as small as possible; targeting a small set of questions to answer (like half-a-dozen or less).

How do you think about the scope of a Kedro project?

5 comments

GGalen Seilis

Combining SQLModel and pydantic-kedro

Anyone tried out combining SQLModel and pydantic-kedro?

2 comments

GGalen Seilis

Customise Or Override Project-specific Kedro Commands

I am looking at this example: https://docs.kedro.org/en/stable/development/commands_reference.html#customise-or-override-project-specific-kedro-commands

"""Command line tools for manipulating a Kedro project.
Intended to be invoked via `kedro`."""
import click
from kedro.framework.cli.project import (
    ASYNC_ARG_HELP,
    CONFIG_FILE_HELP,
    CONF_SOURCE_HELP,
    FROM_INPUTS_HELP,
    FROM_NODES_HELP,
    LOAD_VERSION_HELP,
    NODE_ARG_HELP,
    PARAMS_ARG_HELP,
    PIPELINE_ARG_HELP,
    RUNNER_ARG_HELP,
    TAG_ARG_HELP,
    TO_NODES_HELP,
    TO_OUTPUTS_HELP,
)
from kedro.framework.cli.utils import (
    CONTEXT_SETTINGS,
    _config_file_callback,
    _split_params,
    _split_load_versions,
    env_option,
    split_string,
    split_node_names,
)
from kedro.framework.session import KedroSession
from kedro.utils import load_obj


@click.group(context_settings=CONTEXT_SETTINGS, name=__file__)
def cli():
    """Command line tools for manipulating a Kedro project."""


@cli.command()
@click.option(
    "--from-inputs", type=str, default="", help=FROM_INPUTS_HELP, callback=split_string
)
@click.option(
    "--to-outputs", type=str, default="", help=TO_OUTPUTS_HELP, callback=split_string
)
@click.option(
    "--from-nodes", type=str, default="", help=FROM_NODES_HELP, callback=split_node_names
)
@click.option(
    "--to-nodes", type=str, default="", help=TO_NODES_HELP, callback=split_node_names
)
@click.option("--nodes", "-n", "node_names", type=str, multiple=True, help=NODE_ARG_HELP)
@click.option(
    "--runner", "-r", type=str, default=None, multiple=False, help=RUNNER_ARG_HELP
)
@click.option("--async", "is_async", is_flag=True, multiple=False, help=ASYNC_ARG_HELP)
@env_option
@click.option("--tags", "-t", type=str, multiple=True, help=TAG_ARG_HELP)
@click.option(
    "--load-versions",
    "-lv",
    type=str,
    multiple=True,
    help=LOAD_VERSION_HELP,
    callback=_split_load_versions,
)
@click.option("--pipeline", "-p", type=str, default=None, help=PIPELINE_ARG_HELP)
@click.option(
    "--config",
    "-c",
    type=click.Path(exists=True, dir_okay=False, resolve_path=True),
    help=CONFIG_FILE_HELP,
    callback=_config_file_callback,
)
@click.option(
    "--conf-source",
    type=click.Path(exists=True, file_okay=False, resolve_path=True),
    help=CONF_SOURCE_HELP,
)
@click.option(
    "--params",
    type=click.UNPROCESSED,
    default="",
    help=PARAMS_ARG_HELP,
    callback=_split_params,
)
def run(
    tags,
    env,
    runner,
    is_async,
    node_names,
    to_nodes,
    from_nodes,
    from_inputs,
    to_outputs,
    load_versions,
    pipeline,
    config,
    conf_source,
    params,
):
    """Run the pipeline."""

    runner = load_obj(runner or "SequentialRunner", "kedro.runner")
    tags = tuple(tags)
    node_names = tuple(node_names)

    with KedroSession.create(
        env=env, conf_source=conf_source, extra_params=params
    ) as session:
        session.run(
            tags=tags,
            runner=runner(is_async=is_async),
            node_names=node_names,
            from_nodes=from_nodes,
            to_nodes=to_nodes,
            from_inputs=from_inputs,
            to_outputs=to_outputs,
            load_versions=load_versions,
            pipeline_name=pipeline,
        )

Generally in Python users are supposed to stay away from single underscore prefixed variables, however this example in the docs illustrates using them. When functions like _config_file_callback appear in user documentation for constructing examples it makes it less clear what is intended for end-users.

Are such methods / functions supposed to be part of the public API?

3 comments

GGalen Seilis

Project context

In this example: https://docs.kedro.org/en/stable/extend_kedro/plugins.html#project-context

I see that _get_project_metadata does not get called. Is it relevant to this example?

from pathlib import Path

from kedro.framework.startup import _get_project_metadata
from kedro.framework.session import KedroSession


project_path = Path.cwd()
session = KedroSession.create(project_path=project_path)
context = session.load_context()

Am I assuming that project_path would get defined elsewhere?

1 comment

GGalen Seilis

Kedro's approach to public and internal APIs

How does the Kedro dev team think about delineating what components belong to the public API vs being internal-use only?

I see single leading underscores _<foo> are used, which I assume means they belong to the private API.

'Sometimes' I see <i><code>__all__</code></i> is used. Are things in that list safe to assume as part of the public API?

If a variable (function/method/class/etc) does not have a leading underscore, and is not in a __<i><code>all_</code></i>_ , does that mean it is safe to assume it is also part of the public API?

4 comments

GGalen Seilis

Kedro-lsp functionality replacement for Neovim

I see that kedro-lsp is on PyPi, but I guess the repository was deleted on Github. Did anybody replace any of that functionality for Neovim?
https://pypi.org/project/kedro-lsp/

12 comments

GGalen Seilis

Integrating Kedro with Microsoft Fabric (MSF)

Anybody know how to integrate Kedro with Microsoft Fabric (MSF)? It would be nice to pull in data from it into a local Kedro project. AFAIK the SemPy package only works from within MSF.

7 comments

GGalen Seilis

Default memory dataset copy method prioritizes accuracy over efficiency

I have a question about the memory dataset's default copy method. I noticed that if the data is a pandas dataframe or a numpy array that copy rather than assignment (i.e. making a reference) is used by default. I'm wondering what the rationale for that is. Often making a reference is cheaper in terms of runtime than making either a shallow or deep copy. Why is assignment not the top priority default?

https://docs.kedro.org/en/stable/_modules/kedro/io/memory_dataset.html#MemoryDataset

8 comments

GGalen Seilis

Pytest warning: Deprecation warning for CovPlugin.pytest_configure_node

I am getting a warning when I run pytest on a dummy project:

PytestDeprecationWarning: The hookimpl CovPlugin.pytest_configure_node uses old-style configuration options (marks or attributes).
  Please use the pytest.hookimpl(optionalhook=True) decorator instead
   to configure the hooks.
   See <a target="_blank" rel="noopener noreferrer" href="https://docs.pytest.org/en/latest/deprecations.html#configuring-hook-specs-impls-using-markers">https://docs.pytest.org/en/latest/deprecations.html#configuring-hook-specs-impls-using-markers</a>
    def pytest_configure_node(self, node):

..\venv\lib\site-packages\pytest_cov\plugin.py:265
\venv\lib\site-packages\pytest_cov\plugin.py:265: PytestDeprecationWarning: The hookimpl CovPlugin.pytest_testnodedown uses old-style configuration options (marks or attributes).
  Please use the pytest.hookimpl(optionalhook=True) decorator instead
   to configure the hooks.
   See <a target="_blank" rel="noopener noreferrer" href="https://docs.pytest.org/en/latest/deprecations.html#configuring-hook-specs-impls-using-markers">https://docs.pytest.org/en/latest/deprecations.html#configuring-hook-specs-impls-using-markers</a>
    def pytest_testnodedown(self, node, error):

-- Docs: <a target="_blank" rel="noopener noreferrer" href="https://docs.pytest.org/en/stable/how-to/capture-warnings.html">https://docs.pytest.org/en/stable/how-to/capture-warnings.html</a>

Is this something I should do anything about, or will it be addressed in a future version of Kedro?

6 comments

GGalen Seilis

Kedro project initialization fails due to missing module

I am getting some kind of environmental variable or config issue. The module for the project cannot be found. At first I thought it was just one project, but it seems to be something broader.

On my system even creating a fresh project gives the same error.

I create a venv, and activate it.
Install kedro (pip install kedro), which is currently giving 0.19.9
Initialize new kedro project.
Run pytest (which gives a module not found error for the very project I just created).

Troubleshooting advice would be appreciated.

19 comments

GGalen Seilis

Running type checkers on Kedro projects

Is there a recommended way to run type checkers (e.g. MyPy) on Kedro projects?

2 comments

Join the Kedro community

Kedro project structure considerations

Combining SQLModel and pydantic-kedro

Customise Or Override Project-specific Kedro Commands

Project context

Kedro's approach to public and internal APIs

Kedro-lsp functionality replacement for Neovim

Integrating Kedro with Microsoft Fabric (MSF)

Default memory dataset copy method prioritizes accuracy over efficiency

Pytest warning: Deprecation warning for CovPlugin.pytest_configure_node

Kedro project initialization fails due to missing module

Running type checkers on Kedro projects