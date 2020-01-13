Senior data scientist building experiment tracking tools for ML projects at https://neptune.ai
Thinking which library should you choose for hyperparameter optimization?
Been using Hyperopt for a while and feel like changing?
Just heard about Optuna and you want to see how it works?
Good!
In this article I will:
show you an example of using Optuna and Hyperopt on a real problem,
compare Optuna vs Hyperopt on API, documentation, functionality, and more,
give you my overall score and recommendation on which hyperparameter optimization library you should use.
Let’s do it.
Evaluation criteria
Ease of use and API
Options methods and hyper(hyperparameters)
Documentation
Visualizations
Speed and Parallelization
Experimental Results
Ease of use and API
In this section I want to see how to run a basic hyperparameter tuning script for both libraries, see how natural and easy-to-use it is and what is the API.
Optuna
You define your search space and objective in one function.
Moreover, you sample the hyperparameters from the trial object. Because of that, the parameter space is defined at execution. For those of you who like Pytorch because of this imperative approach, Optuna will feel natural.
Then, you create the study object and optimize it. What is great is that you can choose whether you want to maximize or minimize your objective. That is useful when optimizing a metric like AUC because you don’t have to change the sign of the objective before training and then convert best results after training to get a positive score.
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=100)
That is it.
Everything you may want to know about the optimization is available in the study object.
What I love about Optuna is that I get to define how I want to sample my search space on-the-fly which gives me a lot of flexibility. Ability to choose a direction of optimization is also pretty nice.
If you want to see the full code example you can scroll down to the example script.
10 / 10
Hyperopt
You start by defining your parameter search space:
By combining hp.choice with other sampling methods we can have conditional spaces. This is useful when you are optimizing hyperparameters for a machine learning pipeline that involves preprocessing, feature engineering and model training.
I have to say I like them both. I can define nested search spaces easily and I have a lot of sampling options for all the parameter types. Optuna has an imperative parameter definition, which gives more flexibility while Hyperopt has more parameter sampling options.
Search Space: Optuna = Hyperopt
Optimization methods
Both Optuna and Hyperopt are using the same optimization methods under the hood. They have:
rand.suggest (Hyperopt) and samplers.random.RandomSampler (Optuna)
Your standard random search over the parameters.
tpe.suggest (Hyperopt) and samplers.tpe.sampler.TPESampler (Optuna)
Tree of Parzen Estimators (TPE). The idea behind this method is similar to what was explained in the previous blog post about Scikit Optimize. We use a cheap surrogate model to estimate the performance of the expensive objective function on a set of parameters.
The difference between the methods used in Scikit Optimize and Tree of Parzen Estimators (TPE) is that instead of estimating the actual performance (point estimation) we want to estimate the density in the tails. We want to be able to tell whether a run will be good (right tail) or bad (left tail).
I like the following explanation taken from the AutoML_Book by amazing folks over at AutoML.org Freiburg.
Instead of modeling the probability p(y|λ) of observations y given the > configurations λ, the Tree Parzen Estimator models density functions p(λ|y < α) and p(λ|y ≥ α). Given a percentile α (usually set to 15%), the observations are divided in good observations and bad observations and simple 1-d Parzen windows are used to model the two distributions.
By using p(λ|y < α) and p(λ|y ≥ α) you can estimate the expected improvement of a parameter configuration over previous best.
Interestingly, both for Optuna and Hyperopt, there are no options to specify the α parameter in the optimizer.
Optuna
integration.SkoptSampler
Optuna lets you use samplers from Scikit-Optimize (skopt).
Skopt offers a bunch of Tree-Based methods as a choice for your surrogate model.
In order to use them you need to:
create a SkoptSampler instance specifying the parameters of the surrogate model and acquisition function in the skopt_kwargs argument,
pass the sampler instance to the optuna.create_study method
from optuna.integration import SkoptSampler
sampler = SkoptSampler(skopt_kwargs={'base_estimator':'RF',
'n_random_starts':10,
'base_estimator':'ET',
'acq_func':'EI',
'acq_func_kwargs': {'xi':0.02})
study = optuna.create_study(sampler=sampler)
study.optimize(objective, n_trials=100)
pruners.SuccessiveHalvingPruner
You can also use one of the multiarmed bandit methods called Asynchronous Successive Halving Algorithm (ASHA). If you are interested in the details please read the paper but the general idea is to:
run a bunch of parameter configurations for some time
prune the (half of) the least promising runs every
run a bunch of parameter configurations for some more time
prune the (half of) the least promising runs every
stop when only one configuration is left
By doing so, the search can focus on the more promising runs. However, the static allocation of the budgets to configurations is a problem in practice (which a newer approach called HyperBand solves).
It is very easy to use ASHA in Optuna. Just pass a SuccesiveHalvingPruner to .create_study() and you are good to go:
from optuna.pruners import SuccessiveHalvingPruner
optuna.create_study(pruner=SuccessiveHalvingPruner())
study.optimize(objective, n_trials=100)
If you are optimizing hyperparameters in a distributed fashion you can load MongoTrials() object that connects to MongoDB. More about running distributed hyperparameter optimization with Hyperopt in the Speed and Parallelization section.
10 / 10
Both make it easy and get the job done.
Persisting and restarting: Optuna = Hyperopt
Run Pruning
Not all hyperparameter configurations are created equal. For some of them, you can tell very quickly that they will not produce high scores. Ideally, you would like to stop those runs as soon as possible try different parameters instead.
Optuna gives you an option to do that with Pruning Callbacks. Many machine learning frameworks are supported:
When you are a user of a library or a framework it is absolutely crucial to find the information you need when you need it. This is where documentation/support channels come into the picture and they can make or break a library.
Let’s see how Optuna and Hyperopt compare on that.
Optuna
It is really good.
There is a proper webpage that explains all the basic concepts and shows you where to find more information.
API Reference with all the functions containing beautiful docstrings. To give you an idea imagine having charts inside of your docstrings so that you can understand what is happening inside your function better. Check out the BaseSampler if you don’t believe me.
It is also important to mention that the supporting team from Preferred Networks really takes care of this project. They respond to Github issues and the community is growing around it with great feature ideas and PRs coming in. Checkout the Github project issues section to see what is going on there.
10 / 10
Hyperopt
It was recently updated and now it is quite alright.
You can distribute your computation over a cluster of machines. Good, step-by-step instructions can be found in this blog post by Tanay Agrawal but in a nutshell, you need to:
Start a server with MongoDB on it which will consume results from your worker training scripts and send out the next parameter set to try,
In your training script, instead of Trials() create a MongoTrials() object pointing to the database server you have started in the previous step,
Move your objective function to a separate objective.py script and rename it to function,
Compile your Python training script,
Run hyperopt-mongo-worker
Though it gets the job done it doesn’t feel quite perfect. You need to do some juggling around the objective function, and starting MongoDB could have been provided in the CLI to makes things easier.
It is also important to mention that integration with Spark via SparkTrials object was recently added. There is a step by step guide to help you get started and you can even use the spark-installation script to makes things easier.
best = hyperopt.fmin(fn = objective,
space = search_space,
algo = hyperopt.tpe.suggest,
max_evals = 64,
trials = hyperopt.SparkTrials())
Works exactly the way you would expect it to work.
Nice and simple!
9 / 10
Both libraries support distributed training which is great. However, Optuna does a bit better job with simpler, more user-friendly interface.
Speed and Parallelization: Optuna = Hyperopt
Experimental results*
Just to be clear those are the results on just one example problem and one run per lib/configuration and they do not guarantee generalization. To run a proper benchmark, you would run it multiple times on various datasets.
That being said, as a practitioner, I would hope to see some improvements over the random search for each problem. Otherwise, why bother with an HPO library?
Ok, so as an example let’s tweak the hyperparameters of the lightGBM model on a tabular, binary classification problem. If you want to use the same dataset as I did you should:
download it from kaggle
use the first 10000 rows from the train.csv file
To make the training quick I fixed the number of boosting rounds to 300 with a 30 round early stopping.
import lightgbm as lgb
from sklearn.model_selection import train_test_split
NUM_BOOST_ROUND = 300
EARLY_STOPPING_ROUNDS = 30deftrain_evaluate(X, y, params):
X_train, X_valid, y_train, y_valid = train_test_split(X, y,
test_size=0.2,
random_state=1234)
train_data = lgb.Dataset(X_train, label=y_train)
valid_data = lgb.Dataset(X_valid, label=y_valid, reference=train_data)
model = lgb.train(params, train_data,
num_boost_round=NUM_BOOST_ROUND,
early_stopping_rounds=EARLY_STOPPING_ROUNDS,
valid_sets=[valid_data],
valid_names=['valid'])
score = model.best_score['valid']['auc']
return score
All the training and evaluation logic is put inside the train_evaluate function. We can treat it as a black box that takes the data and hyperparameter set and produces the AUC evaluation score.
Note:
You can actually turn every script that takes parameters as inputs and outputs the score into such train_evaluate. Once that is done you can treat it as black box and tune your parameters.
Both Optuna and Hyperopt improved over the random search which is good.
TPE implementation from Optuna was slightly better than Hyperopt’s Adaptive TPE but not by much. On the other hand, when running hyperparameter optimization, those small improvements are exactly what you are going for.
What is interesting is that TPE implementation from HPO and Optuna give vastly different results on this problem. Maybe the cutoff point between good and bad parameter configurations λ is chosen differently or sampling methods have defaults that work better for this particular problem.
Moreover, using pruning decreased training time by 4x. I could run 400 searches in the time that runs 100 without pruning. On the flip side, using pruning got a lower score. It may be different for your problem but it is important to consider that when making a decision whether to use pruning or not.
For this section, I assigned points based on the improvements over the random search strategy.
Hyperopt got (0.850 – 0.844)*100 = 6
Optuna got (0.854 – 0.844)*100 = 10
Experimental results: Optuna = Hyperopt
Conclusions
Let’s take a look at the overall scores:
Even if you look at it generously and consider only the features that both libraries share, Optuna is a better framework.
It is on-par or slightly better on all criteria and:
it has better documentation
it has way better visualization suite
it has some features like pruning, callbacks, and exception handling that hyperopt doesn’t support
After doing all this research I am convinced that Optuna is a great library for hyperparameter optimization.
Moreover, I think that you should strongly consider switching from Hyperopt if you were using that in the past.