Hyperparameter Tuning Frameworks Within Kedro

Question

Hi Team!

Anyone ever played with hyperparameter tuning frameworks within kedro? I have found several scattered pieces of info related to this topic, but no complete solutions. Ultimately, I think what I would like to set up is a way to have multiple nodes running at the same time and all contributing to the same tuning experiment.

I would prefer using optuna and this is the way I would go about it based on what I have found online:

Create a node that creates an optuna study
Create N nodes that each run hyperparameter tuning in parallel. Each of them loads the optuna study and if using kedro-mlflow each hyperparameter trial can be logged into its own nested run.
Create a final nodes that process the results of all tuning nodes

Does this sound reasonable to you? Has anyone produced such a kedro workflow already? I would love to see what it looks like.

I am also wondering:

I am thinking of creating an OptunaStudyDataset for the optuna study . Has anyone attempted this already?
For creating N tuning nodes, I am thinking of using the approach presented on the GetInData blog post on dynamic pipelines. Would this be the recommended approach?

Thanks!

Juan Luis Cano Rodríguez · Answer

for now the semi-official approach is the blog post you mentioned - how was that process by the way? any pros and cons you saw?

Juan Luis Cano Rodríguez · Answer

I think some folks have tried to use Optuna w/ Kedro in the past

Guillaume Tauzin · Answer

Do you mean it is semi-official because there's not yet an official approach? Is there any discussion I could follow?

Guillaume Tauzin · Answer

I have not tried implementing it yet, for now it seems reasonable to me but I am asking because I am trying to understand the pros and cons.

Once I get to it, happy to give some feedback (and maybe even some simple code example).

Hugo Evers · Answer

Hé I created a setup for this some time ago, where I use a optuna study dataset, and a yaml configuration loader so you can set all the trial parameters in your conf. If you’d like we can discuss?

Guillaume Tauzin · Answer

Hi  @Hugo Evers ! Yes, that would be super nice, thank you!

Guillaume Tauzin · Answer

@juanlu  I just tried the dynamic pipeline setup. It's actually very similar to what I have been doing so far except I use native YAML inheritance instead of the OmegaConfLoader merge resolver with the custom  _overrides . (BTW, they appear when you do  kedro catalog list ). I feel it looks much neater. Is there any drawback doing it that way? Let me give you an example: Blog post parameter file: study_params:
  study_name: test
  load_if_exists: true
  direction: maximize
  n_trials_per_process: 10

price_predictor:
  _overrides:
    study_name: price_predictor_base
  study_params: ${merge:${study_params},${._overrides}}

base:
    study_params: ${..study_params}

candidate1:
    _overrides:
      study_name: price_predictor_candidate1
    study_params: ${merge:${..study_params},${._overrides}}

candidate2:
    _overrides:
      study_name: price_predictor_candidate2
    study_params: ${merge:${..study_params},${._overrides}}

candidate3:
    _overrides:
      study_name: price_predictor_candidate3
    study_params: ${merge:${..study_params},${._overrides}}

reviews_predictor:
  _overrides:
    study_name: reviews_predictor_base
  study_params: ${merge:${study_params},${._overrides}}

base:
    study_params: ${..study_params}

test1:
    _overrides:
      study_name: reviews_predictor_test1
    study_params: ${merge:${..study_params},${._overrides}} Using the native YAML inheritance: study_params:  & base_study_params
  study_name: test
  load_if_exists: true
  direction: maximize
  n_trials_per_process: 10

price_predictor: 
  base: 
    study_params:  & price_predictor_base_study_params
       < < : *base_study_params
      study_name: price_predictor_base

candidate1:
    study_params:
       < < : *price_predictor_base_study_params
      study_name: price_predictor_candidate1

candidate2:
    study_params:
       < < : *price_predictor_base_study_params
      study_name: price_predictor_candidate2

candidate3:
    study_params:
       < < : *price_predictor_base_study_params
      study_name: price_predictor_candidate3

reviews_predictor:
  base: 
    study_params:  & reviews_predictor_base_study_params
       < < : *base_study_params
      study_name: reviews_predictor_base

candidate1:
    study_params:
       < < : *reviews_predictor_base_study_params
      study_name: reviews_predictor_test1 Happy to hear your thoughts on this!

Juan Luis Cano Rodríguez · Answer

It's actually very similar to what I have been doing so far except I use native YAML inheritance instead of the OmegaConfLoader merge resolver with the custom _overrides.

I do prefer the YAML merge keys version actually 😄 @marrrcin any thoughts?

Join the Kedro community

Hyperparameter Tuning Frameworks Within Kedro