In Kedro pipeline tests, what's the best way to mock the underlying nodes? we use pytest
Hello Team!
So it's been a few months since we started using kedro and it's time to deploy some of the pipelines we have created.
We need to choose an orchestrator but this is not our field of expertise, so I wanted to ask for some help. We would like something simple to setup and use collaboratively. Also my company requires it is free (at least for now), our cloud provider is AWS and we already use mlflow. Here are the alternatives we found:
- Prefect (open-source, seems nice to use, kedro support, but free tier imposes limitations)
- Flyte (free?, open-source, seems nice to use, no kedro support)
- MLRun (free and open-source, no kedro support? seems nice to use but a bit more than an orchestrator, requires python 3.9)
- Kubeflow Pipelines (free and open-source, kedro plugin, and others seem to think it is complex to setup and maintain)
- Airflow (free and open-source, kedro plugin)
- Sagemaker (Amazon, kedro plugin, personally dislike its UI and how other AWS services are organized around it)
What would you recommend? What should we consider to make such a decision?
Thanks for your help :)
Hi, all. I have a question regarding how nodes/pipelines read dataset as input datasets. Take this catalog configuration in the following link as example, I assume the kedro pipeline read data from CSV file stored in Amazon S3 when you specify as inputs=["cars"] in node configuration. I was wondering if there are multiple different nodes that take "cars" as input datasets, does kedro pipeline use those datasets from memory, or does it read from Amazon S3 every time they need the datasets?
https://docs.kedro.org/en/stable/data/data_catalog_yaml_examples.html#load-multiple-datasets-with-similar-configuration-using-yaml-anchors
And if it does read the same datasets from certain data source every time it runs the various nodes, is it possible to store the dataset in memory after the first reading from whatever the data source is (Amazon S3 CSV file in this case) and reuse them from memory so that you don't need to read from the data source multiple times and possibly leading to shorter processing time?
Kedro + GetInData Folks! :kedro:
I am following this repo to submit a kedro pyspark job to dataproc serverless: https://github.com/getindata/kedro-pyspark-dataproc-demo
On submitting the job
gcloud dataproc batches submit pyspark file:///home/kedro/src/entrypoint.py \ --project my-project \ --region=europe-central2 \ --container-image=europe-central2-docker.pkg.dev/my-project/kedro-dataproc-demo/kedro-dataproc-iris:latest \ --service-account dataproc-worker@my-project.iam.gserviceaccount.com \ --properties spark.app.name="kedro-pyspark-iris",spark.dynamicAllocation.minExecutors=2,spark.dynamicAllocation.maxExecutors=2 \ -- \ run
Entry point script contains the following:
import os from kedro.framework import cli os.chdir("/home/kedro") cli.main()
I am getting the following error:
[10/15/24 17:30:21] INFO Loading data from data_catalog.py:343 'example_iris_data' (SparkDataSet)... [10/15/24 17:30:22] WARNING There are 3 nodes that have not run. runner.py:178 You can resume the pipeline run by adding the following argument to your previous command: ╭───────────────────── Traceback (most recent call last) ──────────────────────╮ │ /usr/local/lib/python3.9/site-packages/kedro/io/core.py:186 in load │ │ │ │ 183 │ │ self._logger.debug("Loading %s", str(self)) │ │ 184 │ │ │ │ 185 │ │ try: │ │ ❱ 186 │ │ │ return self._load() │ │ 187 │ │ except DataSetError: │ │ 188 │ │ │ raise │ │ 189 │ │ except Exception as exc: │ │ │ │ ╭───────────────────────────────── locals ─────────────────────────────────╮ │ │ │ message = 'Failed while loading data from data set │ │ │ │ SparkDataSet(file_format=csv, filepath=g'+2319 │ │ │ │ self = <kedro.extras.datasets.spark.spark_dataset.SparkDataSet object │ │ │ │ at 0x7f4163077730> │ │ │ ╰──────────────────────────────────────────────────────────────────────────╯ │ │ │ │ /usr/local/lib/python3.9/site-packages/kedro/extras/datasets/spark/spark_dat │ │ aset.py:380 in _load │ │ │ │ 377 │ │ │ 378 │ def _load(self) -> DataFrame: │ │ 379 │ │ load_path = _strip_dbfs_prefix(self._fs_prefix + str(self._get │ │ ❱ 380 │ │ read_obj = self._get_spark().read │ │ 381 │ │ │ │ 382 │ │ # Pass schema if defined │ │ 383 │ │ if self._schema: │ │ │ │ ╭───────────────────────────────── locals ─────────────────────────────────╮ │ │ │ load_path = '<a target="_blank" rel="noopener noreferrer" href="gs://aa-dev-crm-users/abhishek/misc/iris.csv">gs://aa-dev-crm-users/abhishek/misc/iris.csv</a>' │ │ │ │ self = <kedro.extras.datasets.spark.spark_dataset.SparkDataSet │ │ │ │ object at 0x7f4163077730> │ │ │ ╰──────────────────────────────────────────────────────────────────────────╯ │ │ │ │ /usr/lib/spark/python/pyspark/sql/session.py:1706 in read │ │ │ │ 1703 │ │ |100|Hyukjin Kwon| │ │ 1704 │ │ +---+------------+ │ │ 1705 │ │ """ │ │ ❱ 1706 │ │ return DataFrameReader(self) │ │ 1707 │ │ │ 1708 │ @property │ │ 1709 │ def readStream(self) -> DataStreamReader: │ │ │ │ ╭────────────────────────────── locals ──────────────────────────────╮ │ │ │ self = <pyspark.sql.session.SparkSession object at 0x7f4174ebcf40> │ │ │ ╰────────────────────────────────────────────────────────────────────╯ │ │ │ │ /usr/lib/spark/python/pyspark/sql/readwriter.py:70 in __init__ │ │ │ │ 67 │ """ │ │ 68 │ │ │ 69 │ def __init__(self, spark: "SparkSession"): │ │ ❱ 70 │ │ self._jreader = spark._jsparkSession.read() │ │ 71 │ │ self._spark = spark │ │ 72 │ │ │ 73 │ def _df(self, jdf: JavaObject) -> "DataFrame": │ │ │ │ ╭───────────────────────────────── locals ─────────────────────────────────╮ │ │ │ self = <pyspark.sql.readwriter.DataFrameReader object at │ │ │ │ 0x7f41631fa700> │ │ │ │ spark = <pyspark.sql.session.SparkSession object at 0x7f4174ebcf40> │ │ │ ╰──────────────────────────────────────────────────────────────────────────╯ │ │ │ │ /usr/lib/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1322 in │ │ __call__ │ │ │ │ [Errno 20] Not a directory: │ │ '/usr/lib/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py' │ │ │ │ /usr/lib/spark/python/pyspark/errors/exceptions/captured.py:185 in deco │ │ │ │ 182 │ │ │ if not isinstance(converted, UnknownException): │ │ 183 │ │ │ │ # Hide where the exception came from that shows a non- │ │ 184 │ │ │ │ # JVM exception message. │ │ ❱ 185 │ │ │ │ raise converted from None │ │ 186 │ │ │ else: │ │ 187 │ │ │ │ raise │ │ 188 │ │ │ │ ╭───────────────────────────────── locals ─────────────────────────────────╮ │ │ │ a = ( │ │ │ │ │ 'xro91', │ │ │ │ │ <py4j.clientserver.JavaClient object at 0x7f417cb199d0>, │ │ │ │ │ 'o88', │ │ │ │ │ 'read' │ │ │ │ ) │ │ │ │ converted = IllegalArgumentException() │ │ │ │ f = <function get_return_value at 0x7f417b8c0310> │ │ │ │ kw = {} │ │ │ ╰──────────────────────────────────────────────────────────────────────────╯ │ ╰──────────────────────────────────────────────────────────────────────────────╯ IllegalArgumentException: The value of property spark.app.name must not be null
Almost 100% sure that this error is not due to my any mis-spec in my
Dockerfile
or requirements, because it works perfectly if I change the entrpoint script to the following:from pyspark.sql import SparkSession spark = SparkSession.builder.appName("SimpleApp").getOrCreate() df = spark.read.csv("<a target="_blank" rel="noopener noreferrer" href="gs://aa-dev-crm-users/abhishek/misc/iris.csv">gs://aa-dev-crm-users/abhishek/misc/iris.csv</a>", inferSchema=True, header=True) print(df.show())
I have a question about the memory dataset's default copy method. I noticed that if the data is a pandas dataframe or a numpy array that copy rather than assignment (i.e. making a reference) is used by default. I'm wondering what the rationale for that is. Often making a reference is cheaper in terms of runtime than making either a shallow or deep copy. Why is assignment not the top priority default?
https://docs.kedro.org/en/stable/_modules/kedro/io/memory_dataset.html#MemoryDataset
Hey there! Quick question about kedro-azureml. We are using AzureML, and we'd like to use AzureMLAssetDataset with dataset factories.
After a lot of headach and debugging, it seems impossible to use both, as the way credentials are passed to the AzureMLAssetDataset is done through a hook (after_catalog_created), but the issue is that if you use a dataset_patterns (as in, declare your dataset as "{name}.csv" or something similar), the hook is called, but the patterned dataset is not instanciated yet.
After all that, a before_node_run is called, and then there is a AzureMLAssetDataset._load() called, but the AzureMLAssetDataset.azure_config setter hasn't been called yet (as it is called only in the after_catalog_created hook). At first glance, it seems like a kedro-azureml issue, as AzureMLAssetDataset._load() can be called without the setter being called when used as a dataset factory. But also, it might be a kedro issue, as I think there should be an obvious way to setup credentials in that specific scenario, and I don't quite see it from the docs on hook either
Hey Everyone
I am getting below errors while the pipeline is trying to push some data to s3. Any headsup ?ClientError: An error occurred (400) when calling the HeadObject operation: Bad Request
The above exception was the direct cause of the following exception:
DatasetError: Failed while saving data to data set CSVDataset(filepath=ml-datawarehouse/warehouse/extraction/doc_table_insert.csv, load_args={},
protocol=s3, save_args={'index': False}, version=Version(load=None, save='2024-10-15T15.35.46.341Z')).
[Errno 22] Bad Request
Hi all,
When running uv run kedro run
, the node in blue gets run before running the nodes upstream, while these are input for the blue node (it basicually unions two datasets back together). I would not expect this behavior, as I thought the entire pipeline should be executed as a DAG? Am I wrong in this assumption here? I have the following pipelines: ingestion
, data_prep
, feature
, model_input
, modeling
and reporting
.
Hi everyone
I have been exploring ibis for sometime. I just wanted to understand is there a better way to write the below code in a more optimised fashion
import ibis con = ibis.connect(POSTGRES_CONNECTION_STRING) training_meta_table:ir.Table = con.table("training_metadata") filters = { "customer_ids" : [59] , "queue_names" : ["General Lit - Misclassifications", "MoveDocs-MR"], "start_date" : "2024-09-5 00:00:00", "end_date" : "2024-09-11 00:00:00", "doc_types" : [], "fields" : ["patientFirstName", "patientLastName", "Service Date", "Doctor"] } field_conditions = training_meta_table.fields_present.contains(filters["fields"][0]) | training_meta_table.fields_present.contains(filters["fields"][1]) | training_meta_table.fields_present.contains(filters["fields"][2]) | training_meta_table.fields_present.contains(filters["fields"][3])
So there are many or conditions we would like to dynamically join together to create 1 final condition based on the input filters
Hi all!
I am working with a clustering pipeline that I regularly want to rerun to monitor cluster migrations. I am using SnowflakeTableDatasets to save data directly to the data warehouse. Now, since it is not possible to have the same input and output dataset in Kedro, I was wondering what would be best practice to rerun clustering and store to the same SnowparkTableDataset when storing on a different timestamp for example. Would appreciate your help here!
Hello Team,
I want to save a df back to a Snowpark Table dataset object, but im running into this error
DatasetError: Failed while saving data to data set SnowparkTableDataset(...). 'DataFrame' object has no attribute 'write'Code snippet in thread, please let me know if there is a way to do this 😄 Thanks so much!
Hey Kedroids! :kedro:
(Apologies in advance for the long message but would really really appreciate a good discussion on below from the kedro community! 🙂 )
I have a usecase of deploying kedro pipelines using VertexAI SDK.
- In the production system (web app), I want to be able to trigger a kedro pipeline (or multiple pipelines) with specified parameters (say from the UI).
- Let's say we have a API endpoint
https://my.web.app/api/v1/some-task
- Body includes parameters to trigger 1 or multiple kedro pipelines as a Vertex AI DAG
My VertexAI DAG has a combination of nodes (steps), and each node:
- May or may not be a kedro pipeline
- May be a pyspark workload running on dataproc or non spark workload running on a single compute VM
- May run a bigquery job
- May or may not run in a docker container
Let's take the example of submitting a kedro pipeline on Dataproc serverless running on a custom docker container using VertexAI SDK.
Questions:
- Do you package the kedro code as part of the Docker container or just the dependencies?
For example, i have seen this done alot which packages the kedro code as well:
RUN mkdir /usr/kedro WORKDIR /usr/kedro/ COPY . .
which means copying the whole project, and then in the
src/entrypoint.py
,from kedro.framework import cli import os os.chdir("/usr/kedro") cli.main()
2. Do I need to package my kedro project as a wheel file and submit it with the job to Dataproc? If so, how have you seen that done with DataprocPySparkBatchOp?
3. How do you recommend to pass dynamic parameters to the kedro pipeline run?
As I understand
cli.main()
picks up sys.argv to infer pipeline name and parameters so one could thatkedro run --pipeline <my_pipeline> --params=param_key1=value1,param_key2=2.0
Is there a better recommended way of doing this?
Thanks alot and hoping for a good discussion! 🙂
Hey guys,
I've been experimenting with packaging a Kedro project using the kedro package
command and I am running into an issue.
First off, I am attempting to running it like this:
from <my-package>.__main__ import main main( ["--tags", "<my-tags>", "--env", "base"] )
Is this correct?
When I do try to run it like this, the following error is raised: ImportError: cannot import name 'TypeAliasType' from 'typing_extensions' (/databricks/python/lib/python3.10/site-packages/typing_extensions.py) File <command-3656540420037005>, line 2 1 from <my-package>.__main__ import main ----> 2 main( 3 ["--tags", "int_tms_hotel_reservations", "--env", "base"] 4 ) File /local_disk0/.ephemeral_nfs/envs/pythonEnv-30b382f4-147d-466f-a67b-6ce8dcc92265/lib/python3.10/site-packages/sqlalchemy/util/typing.py:56 54 from typing_extensions import TypeGuard as TypeGuard # 3.10 55 from typing_extensions import Self as Self # 3.11 ---> 56 from typing_extensions import TypeAliasType as TypeAliasType # 3.12 58 _T = TypeVar("_T", bound=Any) 59 _KT = TypeVar("_KT")How can I overcome this? I tried upgrading the version of the
typing-extensions
package without any luck. The current version of this package installed on my cluster is 4.12.2.I am running this project on Databricks and I think it is best to avoid running the package using
python -m ..
. That is why I am looking for a Python option. I am using Kedro 0.19.4.Hello everyone,
I am encountering some issues regarding the use of placeholders for the data catalog and I was hoping you can shed some light on this .
I have the following pipeline:
load_date = settings.LOAD_DATE_COMPARISON.get("current") previous_load_date = settings.LOAD_DATE_COMPARISON.get("previous") def create_pipeline(**kwargs) -> Pipeline: format_data_quality = pipeline( [ node( func= compare_id, inputs=[f"maestro_indicadores_{load_date}", f"maestro_indicadores_{previous_load_date}"], outputs=f"compare_id_{load_date}_{previous_load_date}", name="compare_id_node", tags = "compare_id" ),] ) return format_data_qualityWith the corresponding catalog entry for the output:
compare_id_{load_date}_{previous_load_date}: type: json.JSONDataset filepath: reports/{load_date}/id_comparison/id_comparison_{load_date}_{previous_load_date}.jsonThe issue here is that whenever the value of load date is something like 2024_07_01, it will generate a path like:
reports/2024/id_comparison/id_comparison_ 2024_07_01_2024_05_01.json
Note that the first placeholder is not being substituted with the intended value, while the others are.
This will only happen when the value of load_date contains underscores, not happening with dots or hyphens.
Why does this happen?
Hi everyone!
Does it make sense to combine temporal.io with kedro? Does anyone has any experience?
Thanks!
Hey Everyone Interested to know how do you guys manage your requirements.txt file to reproduce the same environment. What tools do you prefer to keep the requirements.txt file updated
Hi everyone. By using hooks I’ve succeeded to show execution time of each nodes. However, I also want to know how long the whole process takes, which is from loading data, executing nodes, and eventually to saving data to Databricks catalog.
So in the attached image, I want to know the time difference between “INFO Completed 1 out of tasks” and “INFO Loading data from ‘params: …”, not just node execution time. I surely can know the time difference simply by manually calculating, but because there are hundreds of nodes, it takes at least an hour to calculate all of them, and it would be really helpful to be able to know how long each tasks take by first glance. Is there any way to do this? Is it also possible by utilizing hooks?
https://kedro-org.slack.com/archives/C03RKP2LW64/p1728353683266369
Hi everyone!
I'm trying to run the following node in Kedro:def test(a):
print(a)
return 2+2
node(
func=test,
inputs=[ 'params:parameter'],
outputs="not_in_catalog",
name="test_node",
),
test()
is in nodes.py and the node in pipeline.py. When I run kedro run --nodes test_node
I get the following log:
(pamflow_kedro_env) s0nabio@hub:~/kedroPamflow$ kedro run --nodes test_node [10/10/24 14:49:06] INFO Using '/home/s0nabio/miniconda3/envs/pamflow_kedro_env/lib/python3.10/site-packages/kedro/framework/project/rich_logging.yml' as logging configuration. __init__.py:249 [10/10/24 14:49:07] INFO Kedro project kedroPamflow session.py:327 Illegal instruction (core dumped)I already ran Kedro in the active environment (Python 3.10.14) in a Windows machine and it Worked. Now I'm trying to run it in a Linux VM and is when I get the error. The only libraries I have installed are
birdnetlib==0.17.2 contextily==1.6.2 fsspec==2024.9.0 geopandas==1.0.1 kedro==0.19.8 kedro_datasets==4.1.0 librosa==0.10.2 matplotlib==3.6.2 numpy==1.23.5 pandas==2.2.3 pytest==8.3.3 PyYAML==6.0.2 scikit-maad==1.4.1 seaborn==0.13.2 statsmodels==0.14.4 tensorflow==2.17.0If I run
test()
using python directley on the terminal instead of through Kedro I don't get the error. That's why I'm here beacause without any warnings and just when I try to run the simplest kedro node, I get the error.Ho , I copy your question here. My team and I are using kedro
with databricks without problem. Our sources are databricks native tables which can be dealt with the specific ManagedTableDataset
, see here. You can unit-test your nodes with a local spark cluster without issue too.
Hello everyone
Just wanted to know is there a way to access values of command line arguments like --env in our kedro pipeline source code.
Hi everyone! I have a couple of questions about Kedro:
- I'm using an external Java tool to convert XML to linked data in one of my nodes, and the tool produces an output, but it's created outside of the Python function. Right now, I'm using a dummy dataset as an output and then using that as an input for the next node to make Kedro Viz visualize the connection properly. However, this feels a bit clumsy. Is there a more elegant way to sequentially connect nodes in Kedro without requiring a dataset in between?
- I would like to use Kedro for a project that performs the ETL for multiple institutes. I'm planning to use namespaces since the ETL process is similar for most institutes. After running the individual pipelines, there is part of the ETL that can either be run with the output from a single institute or sometimes needs to be run with the outputs from all institutes together. Currently, with a pure Python approach, we output each institute's data into a shared directory and then run the shared part using the content of that directory. However, Kedro doesn't allow multiple nodes to output to the same dataset (folder in this case). How could I connect the shared pipeline with each institute's pipeline in this case?
Hi kedroids :kedro:
We have a usecase in which we are scheduling bigquery queries to run in a specific order using a kedro pipeline.
We use the bigquery client simply to trigger the SQL query on bigquery as follows:
def trigger_query_on_bigquery( query: str, ): client = bigquery.Client() query_job = client.query_and_wait(query) return True
The kedro dag to schedule multiple queries in order looks as follows:
def create_retail_data_primary_pipeline() -> Pipeline: nodes = [ node( func=trigger_prm_customer_on_big_query, outputs="prm_customer@status", ), node( func=trigger_prm_transaction_detail_ecom_on_big_query, inputs=["prm_product_hierarchy@status"], outputs="prm_transaction_detail_ecom@status", ), node( func=trigger_prm_transaction_detail_retail_on_big_query, inputs=["prm_product_hierarchy@status"], outputs="prm_transaction_detail_retail@status", ), node( func=trigger_prm_transaction_detail_on_big_query, inputs=[ "prm_transaction_detail_ecom@status", "prm_transaction_detail_retail@status", "prm_product_hierarchy@status", "prm_customer@status", ], outputs="prm_transaction_detail@status", ), node( func=trigger_prm_incident_on_big_query, outputs="prm_incident@status", ), node( func=trigger_prm_product_hierarchy_on_big_query, outputs="prm_product_hierarchy@status", ), ]
since the node can't output the dataframe itself, we output a transcoded entry with
@status
(which is just True
), and then use the actual bigquery spark.SparkDataset
transcoded entry versions of these datasets in downstream pipeline to enforce the order.So I will use
prm_product_hierarchy@bigquery
dataset in a downstream node, just so that kedro runs the query to trigger bigquery query first.Is there a better way to do this?
Hey everyone, I am trying to define the column dtypes of a CSV dataset because some columns contain IDs that Kedro interprets as floats, but should be interpreted as strings instead. Setting
load_args: dtype: user_id: str save_args: dtype: user_id: str
does not seem to work for me. Appreciate your help!
Hey Everyone
Interested to know from you people which orchestration service you guys prefer to run kedro in production environments and how has been the experience so far
Recently I have been trying to run kedro on kubeflow and have been facing multiple issues.
good morning all!
We are facing an error in using global variable interpolation with the OmegaConfigLoader. The error occurs when lunching a jupyter notebook e.g with kedro jupyter lab
the issue seems very similar/identical to the one signaled here https://kedro-org.slack.com/archives/C03RKP2LW64/p1726216824633969
the full error stack is below. The global var is located in conf\globals.yml
The issue also occurs for the location conf\base\globals.yml
Any help from the kedro team is very much appreciated
Traceback (most recent call last): File "C:\Users\IonutBarbu\miniconda3\envs\EIT-Epsilon\Lib\site-packages\IPython\core\shellapp.py", line 322, in init_extensions self.shell.extension_manager.load_extension(ext) File "C:\Users\IonutBarbu\miniconda3\envs\EIT-Epsilon\Lib\site-packages\IPython\core\extensions.py", line 62, in load_extension return self._load_extension(module_str) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\IonutBarbu\miniconda3\envs\EIT-Epsilon\Lib\site-packages\IPython\core\extensions.py", line 79, in _load_extension if self._call_load_ipython_extension(mod): ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\IonutBarbu\miniconda3\envs\EIT-Epsilon\Lib\site-packages\IPython\core\extensions.py", line 129, in _call_load_ipython_extension mod.load_ipython_extension(self.shell) File "C:\Users\IonutBarbu\miniconda3\envs\EIT-Epsilon\Lib\site-packages\kedro\ipython\__init__.py", line 62, in load_ipython_extension reload_kedro() File "C:\Users\IonutBarbu\miniconda3\envs\EIT-Epsilon\Lib\site-packages\kedro\ipython\__init__.py", line 123, in reload_kedro catalog = context.catalog ^^^^^^^^^^^^^^^ File "C:\Users\IonutBarbu\miniconda3\envs\EIT-Epsilon\Lib\site-packages\kedro\framework\context\context.py", line 187, in catalog return self._get_catalog() ^^^^^^^^^^^^^^^^^^^ File "C:\Users\IonutBarbu\miniconda3\envs\EIT-Epsilon\Lib\site-packages\kedro\framework\context\context.py", line 223, in _get_catalog conf_catalog = self.config_loader["catalog"] ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^ File "C:\Users\IonutBarbu\miniconda3\envs\EIT-Epsilon\Lib\site-packages\kedro\config\omegaconf_config.py", line 201, in __getitem__ base_config = self.load_and_merge_dir_config( # type: ignore[no-untyped-call] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\IonutBarbu\miniconda3\envs\EIT-Epsilon\Lib\site-packages\kedro\config\omegaconf_config.py", line 341, in load_and_merge_dir_config for k, v in OmegaConf.to_container( ^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\IonutBarbu\miniconda3\envs\EIT-Epsilon\Lib\site-packages\omegaconf\omegaconf.py", line 573, in to_container return BaseContainer._to_content( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\IonutBarbu\miniconda3\envs\EIT-Epsilon\Lib\site-packages\omegaconf\basecontainer.py", line 292, in _to_content value = get_node_value(key) ^^^^^^^^^^^^^^^^^^^ File "C:\Users\IonutBarbu\miniconda3\envs\EIT-Epsilon\Lib\site-packages\omegaconf\basecontainer.py", line 247, in get_node_value value = BaseContainer._to_content( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\IonutBarbu\miniconda3\envs\EIT-Epsilon\Lib\site-packages\omegaconf\basecontainer.py", line 292, in _to_content value = get_node_value(key) ^^^^^^^^^^^^^^^^^^^ File "C:\Users\IonutBarbu\miniconda3\envs\EIT-Epsilon\Lib\site-packages\omegaconf\basecontainer.py", line 244, in get_node_value conf._format_and_raise(key=key, value=None, cause=e) File "C:\Users\IonutBarbu\miniconda3\envs\EIT-Epsilon\Lib\site-packages\omegaconf\base.py", line 231, in _format_and_raise format_and_raise( File "C:\Users\IonutBarbu\miniconda3\envs\EIT-Epsilon\Lib\site-packages\omegaconf\_utils.py", line 899, in format_and_raise _raise(ex, cause) File "C:\Users\IonutBarbu\miniconda3\envs\EIT-Epsilon\Lib\site-packages\omegaconf\_utils.py", line 797, in _raise raise ex.with_traceback(sys.exc_info()[2]) # set env var OC_CAUSE=1 for full trace ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\IonutBarbu\miniconda3\envs\EIT-Epsilon\Lib\site-packages\omegaconf\basecontainer.py", line 242, in get_node_value node = node._dereference_node() ^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\IonutBarbu\miniconda3\envs\EIT-Epsilon\Lib\site-packages\omegaconf\base.py", line 246, in _dereference_node node = self._dereference_node_impl(throw_on_resolution_failure=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\IonutBarbu\miniconda3\envs\EIT-Epsilon\Lib\site-packages\omegaconf\base.py", line 277, in _dereference_node_impl return parent._resolve_interpolation_from_parse_tree( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\IonutBarbu\miniconda3\envs\EIT-Epsilon\Lib\site-packages\omegaconf\base.py", line 584, in _resolve_interpolation_from_parse_tree resolved = self.resolve_parse_tree( ^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\IonutBarbu\miniconda3\envs\EIT-Epsilon\Lib\site-packages\omegaconf\base.py", line 764, in resolve_parse_tree return visitor.visit(parse_tree) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\IonutBarbu\miniconda3\envs\EIT-Epsilon\Lib\site-packages\antlr4\tree\Tree.py", line 34, in visit return tree.accept(self) ^^^^^^^^^^^^^^^^^ File "C:\Users\IonutBarbu\miniconda3\envs\EIT-Epsilon\Lib\site-packages\omegaconf\grammar\gen\OmegaConfGrammarParser.py", line 206, in accept return visitor.visitConfigValue(self) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\IonutBarbu\miniconda3\envs\EIT-Epsilon\Lib\site-packages\omegaconf\grammar_visitor.py", line 101, in visitConfigValue return self.visit(ctx.getChild(0)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\IonutBarbu\miniconda3\envs\EIT-Epsilon\Lib\site-packages\antlr4\tree\Tree.py", line 34, in visit return tree.accept(self) ^^^^^^^^^^^^^^^^^ File "C:\Users\IonutBarbu\miniconda3\envs\EIT-Epsilon\Lib\site-packages\omegaconf\grammar\gen\OmegaConfGrammarParser.py", line 342, in accept return visitor.visitText(self) ^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\IonutBarbu\miniconda3\envs\EIT-Epsilon\Lib\site-packages\omegaconf\grammar_visitor.py", line 301, in visitText return self._unescape(list(ctx.getChildren())) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\IonutBarbu\miniconda3\envs\EIT-Epsilon\Lib\site-packages\omegaconf\grammar_visitor.py", line 389, in _unescape text = str(self.visitInterpolation(node)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\IonutBarbu\miniconda3\envs\EIT-Epsilon\Lib\site-packages\omegaconf\grammar_visitor.py", line 125, in visitInterpolation return self.visit(ctx.getChild(0)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\IonutBarbu\miniconda3\envs\EIT-Epsilon\Lib\site-packages\antlr4\tree\Tree.py", line 34, in visit return tree.accept(self) ^^^^^^^^^^^^^^^^^ File "C:\Users\IonutBarbu\miniconda3\envs\EIT-Epsilon\Lib\site-packages\omegaconf\grammar\gen\OmegaConfGrammarParser.py", line 1041, in accept return visitor.visitInterpolationResolver(self) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\IonutBarbu\miniconda3\envs\EIT-Epsilon\Lib\site-packages\omegaconf\grammar_visitor.py", line 179, in visitInterpolationResolver return self.resolver_interpolation_callback( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\IonutBarbu\miniconda3\envs\EIT-Epsilon\Lib\site-packages\omegaconf\base.py", line 750, in resolver_interpolation_callback return self._evaluate_custom_resolver( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\IonutBarbu\miniconda3\envs\EIT-Epsilon\Lib\site-packages\omegaconf\base.py", line 694, in _evaluate_custom_resolver return resolver( ^^^^^^^^^ File "C:\Users\IonutBarbu\miniconda3\envs\EIT-Epsilon\Lib\site-packages\omegaconf\omegaconf.py", line 445, in resolver_wrapper ret = resolver(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\IonutBarbu\miniconda3\envs\EIT-Epsilon\Lib\site-packages\kedro\config\omegaconf_config.py", line 384, in _get_globals_value raise InterpolationResolutionError( omegaconf.errors.InterpolationResolutionError: Globals key 'model_to_use' not found and no default value provided. full_key: performance_metrics_best_model.filepath object_type=dict