Join the Kedro community

Updated last week

Hyperparameter Tuning Frameworks Within Kedro

Hi Team!

Anyone ever played with hyperparameter tuning frameworks within kedro? I have found several scattered pieces of info related to this topic, but no complete solutions. Ultimately, I think what I would like to set up is a way to have multiple nodes running at the same time and all contributing to the same tuning experiment.

I would prefer using optuna and this is the way I would go about it based on what I have found online:

  1. Create a node that creates an optuna study
  2. Create N nodes that each run hyperparameter tuning in parallel. Each of them loads the optuna study and if using kedro-mlflow each hyperparameter trial can be logged into its own nested run.
  3. Create a final nodes that process the results of all tuning nodes

Does this sound reasonable to you? Has anyone produced such a kedro workflow already? I would love to see what it looks like.

I am also wondering:
  • I am thinking of creating an OptunaStudyDataset for the optuna study . Has anyone attempted this already?
  • For creating N tuning nodes, I am thinking of using the approach presented on the GetInData blog post on dynamic pipelines. Would this be the recommended approach?

Thanks!

J
G
H
8 comments

for now the semi-official approach is the blog post you mentioned - how was that process by the way? any pros and cons you saw?

I think some folks have tried to use Optuna w/ Kedro in the past

Do you mean it is semi-official because there's not yet an official approach? Is there any discussion I could follow?

I have not tried implementing it yet, for now it seems reasonable to me but I am asking because I am trying to understand the pros and cons.

Once I get to it, happy to give some feedback (and maybe even some simple code example).

Hé I created a setup for this some time ago, where I use a optuna study dataset, and a yaml configuration loader so you can set all the trial parameters in your conf. If you’d like we can discuss?

Hi @Hugo Evers! Yes, that would be super nice, thank you!

@juanlu I just tried the dynamic pipeline setup.

It's actually very similar to what I have been doing so far except I use native YAML inheritance instead of the OmegaConfLoader merge resolver with the custom _overrides. (BTW, they appear when you do kedro catalog list).

I feel it looks much neater. Is there any drawback doing it that way?

Let me give you an example:

Blog post parameter file:

study_params:
  study_name: test
  load_if_exists: true
  direction: maximize
  n_trials_per_process: 10

price_predictor:
  _overrides:
    study_name: price_predictor_base
  study_params: ${merge:${study_params},${._overrides}}

  base:
    study_params: ${..study_params}

  candidate1:
    _overrides:
      study_name: price_predictor_candidate1
    study_params: ${merge:${..study_params},${._overrides}}

  candidate2:
    _overrides:
      study_name: price_predictor_candidate2
    study_params: ${merge:${..study_params},${._overrides}}

  candidate3:
    _overrides:
      study_name: price_predictor_candidate3
    study_params: ${merge:${..study_params},${._overrides}}

reviews_predictor:
  _overrides:
    study_name: reviews_predictor_base
  study_params: ${merge:${study_params},${._overrides}}

  base:
    study_params: ${..study_params}

  test1:
    _overrides:
      study_name: reviews_predictor_test1
    study_params: ${merge:${..study_params},${._overrides}}

Using the native YAML inheritance:

study_params: &base_study_params
  study_name: test
  load_if_exists: true
  direction: maximize
  n_trials_per_process: 10

price_predictor: 
  base: 
    study_params: &price_predictor_base_study_params
      <<: *base_study_params
      study_name: price_predictor_base

  candidate1:
    study_params:
      <<: *price_predictor_base_study_params
      study_name: price_predictor_candidate1

  candidate2:
    study_params:
      <<: *price_predictor_base_study_params
      study_name: price_predictor_candidate2

  candidate3:
    study_params:
      <<: *price_predictor_base_study_params
      study_name: price_predictor_candidate3

reviews_predictor:
  base: 
    study_params: &reviews_predictor_base_study_params
      <<: *base_study_params
      study_name: reviews_predictor_base

  candidate1:
    study_params:
      <<: *reviews_predictor_base_study_params
      study_name: reviews_predictor_test1

Happy to hear your thoughts on this!

It's actually very similar to what I have been doing so far except I use native YAML inheritance instead of the OmegaConfLoader merge resolver with the custom _overrides.

I do prefer the YAML merge keys version actually 😄 @marrrcin any thoughts?

Add a reply
Sign up and join the conversation on Slack