-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating

Mastering NLP from Foundations to LLMs
By :

In this part, we are going to dig more into the design and architecture of some of the newest LLMs at the time of writing this book.
The core of ChatGPT is a Transformer, a type of model architecture that uses self-attention mechanisms to weigh the relevance of different words in the input when making predictions. It allows the model to consider the full context of the input when generating a response.
ChatGPT is based on the GPT version of the Transformer. The GPT models are trained to predict the next word in a sequence of words, given all the previous words. They process text from left to right (unidirectional context), which makes them well-suited for text generation tasks. For instance, GPT-3, one of the versions of GPT on which ChatGPT is based, contains 175 billion parameters.
The training process for ChatGPT is done in two steps: pretraining and fine-tuning...