-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating

Deep Reinforcement Learning Hands-On
By :

This section is optional and included for readers who are interested in why the method works. If you wish, you can refer to the original paper on the cross-entropy method, which will be given at the end of the section.
The basis of the cross-entropy method lies in the importance sampling theorem, which states this:
In our RL case, H(x) is a reward value obtained by some policy, x, and p(x) is a distribution of all possible policies. We don't want to maximize our reward by searching all possible policies; instead we want to find a way to approximate p(x)H(x) by q(x), iteratively minimizing the distance between them. The distance between two probability distributions is calculated by Kullback-Leibler (KL) divergence, which is as follows:
The first term in KL is called entropy and it doesn't depend on p2(x), so it could be omitted during the minimization. The second term is called cross-entropy, which is...