Although the standard way of training neural networks is gradient descent and backpropagation, there are some other players in the game. One of them is evolutionary algorithms, such as genetic algorithms.
Evolutionary algorithms are mimicking biological evolution. When you are using a genetic algorithm, you need DNA that describes an instance and a fitness function that shows how close a given solution is to achieving the set aims.
In the first step, a population is created from various random DNAs and the algorithm calculates the fitness values of the instances. In the next step, the algorithm drops a given percent of the population based on the fitness function (weak instances die) and creates new instances by genetic operators. Typical genetic operators are mutation (where the algorithm randomly change some of the genes) and crossover (where the algorithm randomly exchanges gene sequences in the instances).
After the application of the operators, the algorithm calculates the new fitness values and repeats the previous steps. In the long term, the average fitness is increasing, and after a sufficient number of iterations, the algorithm finds the solution.
As you see, the evolutionary searching is mostly random and totally blind compared to the gradient descent which follows the gradient, but it has some good features. One of them is that these algorithms can also be used for non-differentiable functions. If you can describe your solutions with DNA and you can define a fitness function, then you can use them. Another good feature is that these algorithms don’t get stuck in the local minimums and they can run massively parallel because the evaluation of the fitness of the solutions is completely independent.
Uber AI Lab has an article about neuroevolution where they show it is a competitive alternative of training neural networks for reinforcement learning. So, I think it worth knowing about this technology, because in some cases this can be a better alternative of finding a solution.
First of all, we define the neural network in PyTorch:
torch.set_grad_enabled(False) model = nn.Sequential( nn.Linear(observation_space_size, 16), nn.ReLU(), nn.Linear(16, 16), nn.ReLU(), nn.Linear(16, action_space_size) )
As you see, it’s a very simple network with 3 linear layers and ReLU. In the first row, we disable gradient calculation, because we don’t need gradients.
I’m using PyGAD for running the genetic algorithm. PyGAD is a simple, easy-to-use python library for genetic algorithms. It has an extension for PyTorch to create the DNA from the network and build the network from the DNA. In the case of neural networks, the DNA is simply the list of the weights. This network has 386 parameters, so the DNA is a list of 386 numbers. We are generating an initial population with random DNAs which contains 10 initial solutions with the following code:
torch_ga = pygad.torchga.TorchGA(model=model, num_solutions=10)
The most important part of the program is the fitness function. The fitness function evaluates the solution. The input of the function is the DNA, and the return value can be any number.
The only rule is that the better solution has to be a higher fitness value. In our case, the fitness function will play a game with the neural network which is defined by the DNA, and will give back the sum of the rewards. The following code doing this:
def fitness_func(solution, sol_idx): global model, observation_space_size, env model_weights_dict = pygad.torchga.model_weights_as_dict(model=model, weights_vector=solution) model.load_state_dict(model_weights_dict) # play game observation = env.reset() sum_reward = 0 done = False while (not done) and (sum_reward < 1000): # env.render() ob_tensor = torch.tensor(observation.copy(), dtype=torch.float) q_values = model(ob_tensor) action = np.argmax(q_values).numpy() observation_next, reward, done, info = env.step(action) observation = observation_next sum_reward += reward return sum_reward
We are using torchga.model_weights_as_dict to get back the weights from the DNA, and after simply play a game with OpenAI Gym.
When we train the network, one of the biggest bottlenecks is playing the game. It takes the most time. In this case, the algorithm plays 10 games one after another in every round. But why do we have that many processor cores if we don’t use them?
Fortunately, Python has the multiprocessing.Pool library to run the game on multiple CPU cores. With multiprocessing.Pool we can define a pool of processes to give various tasks to them. The pool has a map method that gets a function and a list parameter. It distributes the elements of the list among the processes, that run the given function on it. This method just right for us to run the games on multiple cores using the neural network instances of the current population.
PyGAD has a cal_pop_fitness method which calculates the fitness values for the current population. The current implementation is a simple iteration. We have to override this method and replace the iteration with the map method of the pool:
class PooledGA(pygad.GA): def cal_pop_fitness(self): global pool pop_fitness = pool.map(fitness_wrapper, self.population) print(pop_fitness) pop_fitness = np.array(pop_fitness) return pop_fitness
The complete code looks like this:
Every run takes a different time because of the randomness but on my i7 notebook, the average time to find a good (200+ sum reward) solution is 10 sec. I think it’s not bad.
Gradient descent is a very efficient way to find solutions but in some cases, evolutionary algorithms and other gradient-free algorithms have comparable or better performance thanks to some good features of these techniques.