avatarEduardo C. Garrido Merchán

Summary

The article discusses the application of Bayesian optimization through the BOTorch library, detailing a transition from Spearmint, a previously used library, to BOTorch for more extensive Bayesian optimization scenarios.

Abstract

The author of the article introduces BOTorch, a modern and versatile Python library for Bayesian optimization, as a superior alternative to the now-abandoned Spearmint library. BOTorch is highlighted for its model-agnostic approach, allowing the use of various predictive distributions, and its inclusion of advanced features such as preferential Bayesian optimization and TurBO methodologies. The article provides a hands-on tutorial on how to implement Bayesian optimization with BOTorch, including code snippets for defining an objective function, setting up experiments to compare Bayesian optimization with random search, and plotting results. The author emphasizes the importance of reproducibility in machine learning experiments and opts for clarity over computational efficiency in the presented code. The tutorial concludes with a demonstration of the superior performance of Bayesian optimization over random search, encouraging readers to apply the provided code to their optimization problems.

Opinions

  • The author expresses a clear preference for BOTorch over Spearmint, citing BOTorch's extensive features and active community support.
  • There is an opinion that Bayesian optimization should be accessible to a wider range of scenarios beyond simple hyper-cube optimization.
  • The author values the reproducibility of experiments and ensures this by setting seeds in the code examples.
  • Clarity is prioritized over optimization in the provided code for educational purposes, acknowledging that this may not be the most computationally efficient approach.
  • The author is confident in the effectiveness of Bayesian optimization, as evidenced by the comparison with random search and the encouragement for readers to test the code on their own problems.

Hands On Bayesian Optimization with BOTorch (BO series III)

BOTorch Logo

First, if you still do not know what Bayesian optimization is, I recommend you to visit my profile and read the two articles that belong to the BO series. Actually, this is part III, so I assume that you now know what is Bayesian optimization.

Introduction and context

The thing is that I have recently changed my Bayesian optimization library from Spearmint to BOTorch. I am currently researching how to make Bayesian optimization applicable to a wider set of scenarios rather than the simple one, optimizing some variables in a hyper-cube. Spearmint was a cool library full of good stuff such as warp transformation for non-stationary functions, sampling from the hyper-parameter distribution of the Gaussian process hyper-parameters and cool acquisition functions such as Predictive Entropy Search (by the way, my Predictive Entropy Search applied to several objectives and constraints is also coded in Spearmint, check out my Github profile). However, Spearmint is abandoned by the community and the maintainers, so I changed to BOTorch, and wow, BOTorch is amazing.

BOTorch

BOTorch is a Python3 library that performs Bayesian optimization in a model agnostic way. It is enough to have something that outputs a predictive distribution of the design space (basically the space defined by the range of the functions). From now on, I will suppose that this model is a Gaussian process, but it could be the empirical distribution of a random forest or a Bayesian neural network. It has all the features of Spearmint and much more: contains preferential Bayesian optimization or cool methodologies such as TurBO. If you are interested on this article, I will speak about them in a following article of the Bayesian optimization series. You can download BOTorch here:

https://botorch.org/

Hands On: How to perform Bayesian optimization with BOTorch.

I will share now some Python code that I have designed with the help of the awesome tutorials of BOTorch. My purpose here is to help the begginer to identify the main parts of Bayesian optimization under the syntax of BOTorch. Of course, the code can be optimized, but I have sacrificed optimization for clarity. (So please, software and data engineers, forgive me :-)).

The problem

I am going to optimize a trivial analytic function, assuming that in practice its analytical expression would be unknown. It is defined by the following expression:

def obj_fun(X_train): #Objective function.
    return torch.tensor([np.sin(x[0])/(np.cos(x[1]) + np.sinh(x[2])) 
+ torch.rand(1)/10.0 for x in X_train])

For those of you that do not know PyTorch, it is basically the TensorFlow of Meta. And if you do now Numpy, for example, the tensor used there is kind of similar to the numpy array. Basically the input of this function is a set of points belonging to the input space and the output are the points evaluated by the objective function in the target space, what we need to optimize. We add a bit of noise to the expression to make things more funny. The function is trivial to optimize, and setting a global maximum at 1.8 is enough for the added noise, as we will further see.

The experiment launcher

I want to compare the performance of Bayesian optimization to the one delivered by random search. Just to be sure that Random search does not win as a result of luck, I will repeat the experiment 3 times, with different seeds. The first 20 points will be chosen randomly for both methods (hence being the same as the random seed is the same). I can configure experiments like this with the following code.

if __name__ == '__main__' :
    total_exps = 3
    initial_design_size = 20
    budget = 35
    n_methods = 2
    total_its = initial_design_size + budget
    results = torch.ones((n_methods, total_its, total_exps))
    for exp in range(total_exps):
        results[0, :, exp] = perform_BO_experiment(exp, initial_design_size, budget).reshape((total_its))
        results[1, :, exp] = perform_random_experiment(exp, initial_design_size, budget).reshape((total_its))
        print(exp)
    plot_results(initial_design_size+budget, results)

The last line is a method that we will further comment that just make a plot of all the results. I have not cared about processors or computational costs here. The ideal setting would include that every experiment is launched at parallel. However, I do it sequentially for clarity purposes. I store all the results in a 3-dimensional tensor, being its first dimension the method, the second the results and the third the id of the experiment. Easy stuff with Python. From now on, I will focus on the BO experiment.

The Bayesian optimization loop

The first thing is to initialize the seeds to ensure reproducibility of the experiment. A property that any machine learning experiment must ensure in order to be reliable. I first evaluate the initial points sampled randomly in the objective function and then I enter a loop until the budget of iterations that I can afford ends. Every iteration calls the Bayesian optimization procedure that has as inputs the previous evaluated points and as output that points plus a new input point which is the Bayesian optimization suggestion and the evaluation of that point in the objective function. At the end of the loop I return the evaluation of the points, hoping that they are good!!

def perform_BO_experiment(seed : int, initial_design_size: int, budget: int) -> torch.Tensor:
    random.seed(seed)
    torch.random.manual_seed(seed)
    X, Y = get_initial_results(initial_design_size)
    for i in range(budget):
        X, Y = perform_BO_iteration(X, Y)
    return Y

The Bayesian optimization iteration

Now things get beautiful, let us perform some Bayesian optimization magic! Recall that Bayesian optimization has two main components: the probabilistic surrogate model and the acquisition function. The probabilistic surrogate model, in this case the Gaussian process, computes a predictive distribution of the value of the objective function in all the points of the input space. Then, the acquisition function is a trade-off between exploration and exploitation whose maximum is the point that optimizes this criterion. The first three lines of this function fit a Gaussian process to the previous observations. Then, we initialize the upper confidence bound acquisition function (UCB) and compute it according to the predictive distribution of the Gaussian process model. We then set the bounds of our variables that define the input space hyper-cube where we perform the optimization and finally optimize the acquisition function. The number of restarts maximize the probability of obtaining a good local optima for the acquisition function. The new_X value returned by the optimization is the input space point where the best local optima of the acquisition function is found and the acquisition value is the value that the acquisition function has in that point. In other words, the new_X is the suggestion made by the Bayesian optimization method to be evaluated. We, hence, evaluate that point in the objective function, obtaining the new_y point, the value of the point suggested by Bayesian optimization. We finish the method just appending the results to the dataset of observations and returning it.

def perform_BO_iteration(X, Y):
    gp = SingleTaskGP(X, Y)
    mll = ExactMarginalLogLikelihood(gp.likelihood, gp)
    fit_gpytorch_model(mll)
    UCB = UpperConfidenceBound(gp, beta=0.1)
    bounds = torch.stack([torch.zeros(X.shape[1]),        torch.ones(X.shape[1])])
    new_X, acq_value = optimize_acqf(
            UCB, bounds=bounds, q=1, num_restarts=5, raw_samples=20,
    )
    new_y = obj_fun(new_X)
    X = torch.cat((X, new_X),0)
    Y = torch.cat((Y, new_y.reshape(1,1)),0)
    return X, Y

Plotting the results

The previous function is called in an iterative fashion by the loop and all the results are stored in a torch tensor. We finally, then, use a function to plot the results, that is, the behavior shown by Bayesian optimization compared with the performance of Random search. In order to do so, I simply used matplotlib (you can have a look at thousands of webs where matplotlib code is shared in order to know how to do it). I prefer to print the logarithm of the regret as when you compared methods whose performance is very similar it is easier to visualize its performance using the logarithm that just printing the value of the objective function. The lowest the regret, evidently the better. Here, as it is obvious, random search does not work well, but this is expected. I obtain the following results:

Regret of the Vanilla Bayesian optimization method with respect to the performance shown by Random Search. First iterations are the same as the values are initialized randomly.

Cool!!! You can now test the code in your computer and begin using Bayesian optimization for your particular problem! I hope that you find this article useful and, until the next article of the BO series, have fun!!!

Artificial Intelligence
Machine Learning
Optimization
Mathematics
Python
Recommended from ReadMedium