-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating

10 Machine Learning Blueprints You Should Know for Cybersecurity
By :

In the previous sections, we have used traditional hand-crafted features, automated bag of words features, as well as embedding representations for text classification. We saw the power of BERT as a language model in the previous chapter. While describing BERT, we referenced that the embeddings generated by BERT can be used for downstream classification tasks. In this section, we will extract BERT embeddings for our classification task.
The embeddings generated by BERT are different from those generated by the Word2Vec model. Recall that in BERT, we use the masked language model and a transformer-based architecture based on attention. This means that the embedding of a word depends on the context in which it occurs; based on the surrounding words, BERT knows which other words to pay attention to and generate the embedding.
In traditional word embeddings, a word will have the same embedding, irrespective of the context. The word...