
Natural Language Processing with TensorFlow
By :

In this chapter, we will discuss a more advanced RNN variant known as Long Short-Term Memory Networks (LSTMs). LSTMs are widely used in many sequential tasks (including stock market prediction, language modeling, and machine translation) and have proven to perform better than other sequential models (for example, standard RNNs), especially given the availability of large amounts of data. LSTMs are well-designed to avoid the problem of the vanishing gradient that we discussed in the previous chapter.
The main practical limitation posed by the vanishing gradient is that it prevents the model from learning long-term dependencies. However, by avoiding the vanishing gradient problem, LSTMs have the ability to store memory for longer than ordinary RNNs (for hundreds of time steps). In contrast to those RNNs, which only maintain a single hidden state, LSTMs have many more parameters as well as better control over what memory to store and what...