
Machine Learning for Finance
By :

Recently, a decades-old optimization algorithm for reinforcement learning algorithms has come back into fashion. Evolutionary strategies (ES) are much simpler than Q-learning or A2C.
Instead of training one model through backpropagation, in ES we create a population of models by adding random noise to the weights of the original model. We then let each model run in the environment and evaluate its performance. The new model is the performance-weighted average of all the models.
In the following diagram, you can see a visualization of how evolution strategies work:
Evolutionary strategy
To get a better grip on how this works, consider the following example. We want to find a vector that minimizes the mean squared error to a solution vector. The learner is not given the solution, but only the total error as a reward signal:
solution = np.array([0.5, 0.1, -0.3]) def f(w): reward = -np.sum(np.square(solution - w)) return reward
A key advantage of...