Guillaume Tauzin

Hyperparameter Tuning Frameworks Within Kedro

Hi Team!

Anyone ever played with hyperparameter tuning frameworks within kedro? I have found several scattered pieces of info related to this topic, but no complete solutions. Ultimately, I think what I would like to set up is a way to have multiple nodes running at the same time and all contributing to the same tuning experiment.

I would prefer using optuna and this is the way I would go about it based on what I have found online:

Create a node that creates an optuna study
Create N nodes that each run hyperparameter tuning in parallel. Each of them loads the optuna study and if using kedro-mlflow each hyperparameter trial can be logged into its own nested run.
Create a final nodes that process the results of all tuning nodes

Does this sound reasonable to you? Has anyone produced such a kedro workflow already? I would love to see what it looks like.

I am also wondering:

I am thinking of creating an OptunaStudyDataset for the optuna study . Has anyone attempted this already?
For creating N tuning nodes, I am thinking of using the approach presented on the GetInData blog post on dynamic pipelines. Would this be the recommended approach?

Thanks!

8 comments

GGuillaume Tauzin

Solved

Accessing Factory Datasets from a DataCatalog/KeroDataCatalog Instance

Hi team!

Is there any way to resolve factory datasets and access them from a DataCatalog/KeroDataCatalog instance?

I notice using the CLI to create a list of datasets kedro catalog list will automatically resolve them (for a given pipeline - see this bit of code) while doing catalog.list() in a kedro jupyter notebook will just list non-factory datasets (and parameters). Are those two returning different outputs by design or is it a bug?

Thanks!

5 comments

GGuillaume Tauzin

Choosing a Simple and Free Orchestrator for Kedro Pipelines on AWS

Hello Team!
So it's been a few months since we started using kedro and it's time to deploy some of the pipelines we have created.
We need to choose an orchestrator but this is not our field of expertise, so I wanted to ask for some help. We would like something simple to setup and use collaboratively. Also my company requires it is free (at least for now), our cloud provider is AWS and we already use mlflow. Here are the alternatives we found:

Prefect (open-source, seems nice to use, kedro support, but free tier imposes limitations)
Flyte (free?, open-source, seems nice to use, no kedro support)
MLRun (free and open-source, no kedro support? seems nice to use but a bit more than an orchestrator, requires python 3.9)
Kubeflow Pipelines (free and open-source, kedro plugin, and others seem to think it is complex to setup and maintain)
Airflow (free and open-source, kedro plugin)
Sagemaker (Amazon, kedro plugin, personally dislike its UI and how other AWS services are organized around it)

What would you recommend? What should we consider to make such a decision?

Thanks for your help :)

29 comments

GGuillaume Tauzin

Accessing hook methods and their signatures

Hello team!
Where can I find a list of all hook methods available and their signatures? I checked the docs but I apologize if I somehow missed it.
Many thanks!

2 comments

Join the Kedro community

Hyperparameter Tuning Frameworks Within Kedro

Accessing Factory Datasets from a DataCatalog/KeroDataCatalog Instance

Choosing a Simple and Free Orchestrator for Kedro Pipelines on AWS

Accessing hook methods and their signatures