Hello Kedro community,
I am currently developing a project where I need to pass in a dynamic number of catalog dataset entries as inputs to a node. The number of input datasets to this node depends on the primary input dataset being used , particularly the number of unique values in one field.
For instance this node expects tree inputs: a column name (this is fixed and not dynamic), feature datasets ,target datasets. This node basically collates all these datasets together in one object as the output of the node-
feature_df_list = [ f"{group_name_cleaned}.features_with_clusters" for group_name_cleaned in groups_cleaned ] target_df_list = [ f"{group_name_cleaned}.target_with_clusters" for group_name_cleaned in groups_cleaned ] input_dict = { "target_col": "params:target_col", "group_list": feature_df_list, "target_clusters_with_features": target_df_list, } node( func=collate_results, inputs=input_dict, outputs="run_collection", ),
I am not sure how i would use that Rashida? I have the params and catalog file setup. How would that help me passing dynamic inputs to a node? If you could share an example tat would be great π
Not sure if this example is relevant in your case
for namespace, variants in settings.DYNAMIC_PIPELINES_MAPPING.items(): for variant in variants: pipes.append( pipeline( data_science_pipeline, inputs={"model_input_table": f"{namespace}.model_input_table"}, namespace=f"{namespace}.{variant}", tags=[variant, namespace], ) ) return sum(pipes)
You probably want a preprocessing pipeline and create data according to your groups and use those as inputs. check out namespaces, helped me a lot with this and the blogpost mentioned by Rashida. Actually ended up implementing it.
Thank you both for pointing me to namespaces . Extremely helpful π.
I also want to create a node that collates output from all namespaces into one summary output. Is there a way to pass all outputs created by the dynamic namespaces to a single node which collates them?
For instance in the example Rashida shared which has base, candidate1 & candidate2 namespaces and regressor models for each. I want to create 1 node which takes the 3 ( this is dynamic) models created as input.
Hi @Vinayak Singh,
I haven't tried this myself, but in principle, the outputs of a node can serve as inputs to another node. If you define your outputs correctly in the DataCatalog, you should be able to reference them as inputs in a new node.
Maybe you can build your input dict beforehand by reading settings.DYNAMIC_PIPELINES_MAPPING.items() ?
so you can populate your inputs with all used namespaces/variants and read it by useing kwargs.
thank you both for your responses . Great suggestion @Philipp Dahlke , i will try and do that.