-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating

10 Machine Learning Blueprints You Should Know for Cybersecurity
By :

So far, we have seen how authorship can be attributed to the writer and how to build models to detect the author. In this section, we will turn to the authorship obfuscation problem. Authorship obfuscation, as discussed in the initial section of this chapter, is the art of purposefully manipulating the text to strip it of any stylistic features that might give away the author.
The code is inspired by an implementation that is freely available online (https://github.com/asad1996172/Obfuscation-Systems) with a few minor tweaks.
First, we will import the required libraries. The most important library here is the Natural Language Toolkit (NLTK) library (https://www.nltk.org/) developed by Stanford. This library contains standard off-the-shelf implementations for several natural language processing (NLP) tasks such as tokenization, part-of-speech (POS) tagging, named entity recognition (NER), and so on. It has a powerful set of functionalities...