Another question from my side.
I have a node which outputs a dictionary called train_test_dts
which I am saving as a pickle with the backend joblib
.
When I then try to run my pipeline with the parallel-runner like this:
kedro run --pipeline feature_engineering --params env=dev,inference_dt=2025-01-05 --runner ParallelRunnerThen I am getting the following error:
AttributeError: The following datasets cannot be used with multiprocessing: ['train_test_dts'] In order to utilize multiprocessing you need to make sure all datasets are serialisable, i.e. datasets should not make use of lambda functions, nested functions, closures etc. If you are using custom decorators ensure they are correctly decorated using functools.wraps().
Stupid question, the kedro vscode
plugin does not work for me. After installing this and the dependencies I still cannot click on the catalog items. Any standard solution for this?
Hey team, hope you are doing well. I have the following question (I already tried to see whether any previous question answers it, with no luck)
I have primary data paths such as:
"prm_customer_base": table_name: primary_${_environment}.prm_customer_base <<: *_conn
--env
that I am running. So, I created a file under conf/dev/catalog_dev.yml
which contains_environment: dev
conf/dev/globals.yml
(+ removing the underscore then ofc). That seems to work, though I am not sure how to feel having a globals.yml
file for each environment, since I was thinking having multiple globals is defeating the point.