Q-learning is considered one of the most popular and often used foundational RL methods . The method itself was developed by Chris Watkins in 1989 as part of his thesis, Learning from Delayed Rewards. Q-learning or rather Deep Q-learning, which we will cover in Chapter 6, Going Deep with DQN, became so popular because of its use by DeepMind (Google) to play classic Atari games better than a human. What Watkins did was show how an update could be applied across state-action pairs using a learning rate and discount factor gamma.
This improved the update equation into a Q or quality of state-action update equation, as shown in the following formula:

In the previous equation, we have the following:
The current state-action quality being updated
The learning rate
The reward for the next state
Gamma, the discount factor
Take the max best or greedy action...