Hi, I'm testing after upgrading to 0.19.9 and I found what seems like a bug - after running the pipeline for the second time with a runner (like during test cases) the output is no longer saved (in catalog or returned as a value from pipeline). That wasn't the case in 0.19.8
Could you share an example? This sounds suspicious, which runner are you using? I only recall a minor change with ThreadRunner
This unfortunately results in making the packaged model servers with kedro-mlflow work only once, then they need a reboot. FYI
when will we start making things that actually can last? one use cutlery, one use batteries, now we get a one use servers >...<
I leave a comment there. It's unclear to me why it breaks (?) I haven't been able to reproduce the error yet. I got a and b both {}
when I run this on GitPod on 0.19.8 and 0.19.9
Is this how your test look like?
def test_data_science_pipeline(caplog, dummy_data, dummy_parameters): pipeline = ( create_ds_pipeline() .from_nodes("split_data_node") .to_nodes("evaluate_model_node") ) catalog = DataCatalog() catalog.add_feed_dict( { "model_input_table" : dummy_data, "params:model_options": dummy_parameters["model_options"], } ) a = SequentialRunner().run(pipeline, catalog) b = SequentialRunner().run(pipeline, catalog) assert a == b
change the test and you’ll reproduce
pipeline = ( create_ds_pipeline() .from_nodes("split_data_node") .to_nodes("train_model_node") )
ya ok, as the issue describe using the test we have in the starter and I cannot reproduce it.
I updated the comment there with the new test, I still think there is an issue with the memory dataset definition
pipeline.outputs()={'y_test', 'X_test', 'regressor'} registered_ds=['params:model_options', 'model_input_table'] memory_datasets={'model_input_table', 'params:model_options'} free_outputs={'y_test', 'X_test', 'regressor'} pipeline.outputs()={'y_test', 'X_test', 'regressor'} registered_ds=['X_test', 'params:model_options', 'model_input_table', 'X_train', 'regressor', 'y_test', 'y_train'] memory_datasets={'model_input_table', 'params:model_options'} free_outputs=set()
free_outputs
, but I expect y_test', 'X_test', 'regressor'
in the memory_dataset, but it's not. That is why the free_output is missing them at the end.I think the issue is with the shallow copy instead. Those free_outputs are initialised before the copy was made, and thus making incorrect reference.
I don't understand the need of the shallow copy - but by shifting all those free_outputs declaration after the shallow copy, I get the expected output correctly.
but by shifting all those free_outputs declaration after the shallow copy, I get the expected output correctly
This unfortunately results in making the packaged model servers with kedro-mlflow work only once, then they need a reboot. FYI
(DataCatalog)
at saving time, but it runs with the one in your environment at loading time. If there is a mismatch, the object does load , or behave like the class is defined aliasing time (e.g. here with the behaviour of the last version of kedro).