This is an open question to anyone who has experience using Kedro. How do you think about what should go into a single Kedro project vs what should be split into multiple Kedro projects.
For some point of comparison, here has been my thinking. These are not universal rules, but are rather tendencies I have noticed.
Anyone tried out combining SQLModel and pydantic-kedro?
I am looking at this example: https://docs.kedro.org/en/stable/development/commands_reference.html#customise-or-override-project-specific-kedro-commands
"""Command line tools for manipulating a Kedro project. Intended to be invoked via `kedro`.""" import click from kedro.framework.cli.project import ( ASYNC_ARG_HELP, CONFIG_FILE_HELP, CONF_SOURCE_HELP, FROM_INPUTS_HELP, FROM_NODES_HELP, LOAD_VERSION_HELP, NODE_ARG_HELP, PARAMS_ARG_HELP, PIPELINE_ARG_HELP, RUNNER_ARG_HELP, TAG_ARG_HELP, TO_NODES_HELP, TO_OUTPUTS_HELP, ) from kedro.framework.cli.utils import ( CONTEXT_SETTINGS, _config_file_callback, _split_params, _split_load_versions, env_option, split_string, split_node_names, ) from kedro.framework.session import KedroSession from kedro.utils import load_obj @click.group(context_settings=CONTEXT_SETTINGS, name=__file__) def cli(): """Command line tools for manipulating a Kedro project.""" @cli.command() @click.option( "--from-inputs", type=str, default="", help=FROM_INPUTS_HELP, callback=split_string ) @click.option( "--to-outputs", type=str, default="", help=TO_OUTPUTS_HELP, callback=split_string ) @click.option( "--from-nodes", type=str, default="", help=FROM_NODES_HELP, callback=split_node_names ) @click.option( "--to-nodes", type=str, default="", help=TO_NODES_HELP, callback=split_node_names ) @click.option("--nodes", "-n", "node_names", type=str, multiple=True, help=NODE_ARG_HELP) @click.option( "--runner", "-r", type=str, default=None, multiple=False, help=RUNNER_ARG_HELP ) @click.option("--async", "is_async", is_flag=True, multiple=False, help=ASYNC_ARG_HELP) @env_option @click.option("--tags", "-t", type=str, multiple=True, help=TAG_ARG_HELP) @click.option( "--load-versions", "-lv", type=str, multiple=True, help=LOAD_VERSION_HELP, callback=_split_load_versions, ) @click.option("--pipeline", "-p", type=str, default=None, help=PIPELINE_ARG_HELP) @click.option( "--config", "-c", type=click.Path(exists=True, dir_okay=False, resolve_path=True), help=CONFIG_FILE_HELP, callback=_config_file_callback, ) @click.option( "--conf-source", type=click.Path(exists=True, file_okay=False, resolve_path=True), help=CONF_SOURCE_HELP, ) @click.option( "--params", type=click.UNPROCESSED, default="", help=PARAMS_ARG_HELP, callback=_split_params, ) def run( tags, env, runner, is_async, node_names, to_nodes, from_nodes, from_inputs, to_outputs, load_versions, pipeline, config, conf_source, params, ): """Run the pipeline.""" runner = load_obj(runner or "SequentialRunner", "kedro.runner") tags = tuple(tags) node_names = tuple(node_names) with KedroSession.create( env=env, conf_source=conf_source, extra_params=params ) as session: session.run( tags=tags, runner=runner(is_async=is_async), node_names=node_names, from_nodes=from_nodes, to_nodes=to_nodes, from_inputs=from_inputs, to_outputs=to_outputs, load_versions=load_versions, pipeline_name=pipeline, )
_config_file_callback
appear in user documentation for constructing examples it makes it less clear what is intended for end-users.In this example: https://docs.kedro.org/en/stable/extend_kedro/plugins.html#project-context
I see that _get_project_metadata
does not get called. Is it relevant to this example?
from pathlib import Path from kedro.framework.startup import _get_project_metadata from kedro.framework.session import KedroSession project_path = Path.cwd() session = KedroSession.create(project_path=project_path) context = session.load_context()
project_path
would get defined elsewhere?How does the Kedro dev team think about delineating what components belong to the public API vs being internal-use only?
I see single leading underscores _<foo>
are used, which I assume means they belong to the private API.
'Sometimes' I see <i><code>__all__</code></i> is used. Are things in that list safe to assume as part of the public API?
If a variable (function/method/class/etc) does not have a leading underscore, and is not in a __
<i><code>all_</code></i>_
, does that mean it is safe to assume it is also part of the public API?
I see that kedro-lsp is on PyPi, but I guess the repository was deleted on Github. Did anybody replace any of that functionality for Neovim?
https://pypi.org/project/kedro-lsp/
Anybody know how to integrate Kedro with Microsoft Fabric (MSF)? It would be nice to pull in data from it into a local Kedro project. AFAIK the SemPy package only works from within MSF.
I have a question about the memory dataset's default copy method. I noticed that if the data is a pandas dataframe or a numpy array that copy rather than assignment (i.e. making a reference) is used by default. I'm wondering what the rationale for that is. Often making a reference is cheaper in terms of runtime than making either a shallow or deep copy. Why is assignment not the top priority default?
https://docs.kedro.org/en/stable/_modules/kedro/io/memory_dataset.html#MemoryDataset
I am getting a warning when I run pytest
on a dummy project:
PytestDeprecationWarning: The hookimpl CovPlugin.pytest_configure_node uses old-style configuration options (marks or attributes). Please use the pytest.hookimpl(optionalhook=True) decorator instead to configure the hooks. See <a target="_blank" rel="noopener noreferrer" href="https://docs.pytest.org/en/latest/deprecations.html#configuring-hook-specs-impls-using-markers">https://docs.pytest.org/en/latest/deprecations.html#configuring-hook-specs-impls-using-markers</a> def pytest_configure_node(self, node): ..\venv\lib\site-packages\pytest_cov\plugin.py:265 \venv\lib\site-packages\pytest_cov\plugin.py:265: PytestDeprecationWarning: The hookimpl CovPlugin.pytest_testnodedown uses old-style configuration options (marks or attributes). Please use the pytest.hookimpl(optionalhook=True) decorator instead to configure the hooks. See <a target="_blank" rel="noopener noreferrer" href="https://docs.pytest.org/en/latest/deprecations.html#configuring-hook-specs-impls-using-markers">https://docs.pytest.org/en/latest/deprecations.html#configuring-hook-specs-impls-using-markers</a> def pytest_testnodedown(self, node, error): -- Docs: <a target="_blank" rel="noopener noreferrer" href="https://docs.pytest.org/en/stable/how-to/capture-warnings.html">https://docs.pytest.org/en/stable/how-to/capture-warnings.html</a>
I am getting some kind of environmental variable or config issue. The module for the project cannot be found. At first I thought it was just one project, but it seems to be something broader.
On my system even creating a fresh project gives the same error.
Is there a recommended way to run type checkers (e.g. MyPy) on Kedro projects?