-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating

10 Machine Learning Blueprints You Should Know for Cybersecurity
By :

The previous section described the importance of authorship attribution and obfuscation. This section will focus on the attribution aspect—how we can design and build models to pinpoint the author of a given text.
There has been prior research in the field of authorship attribution and obfuscation. The standard dataset for benchmarking on this task is the Brennan-Greenstadt Corpus. This dataset was collected through a survey at a university in the United States. 12 authors were recruited, and each author was required to submit a pre-written text that comprised at least 5,000 words.
A modified and improved version of this data—called the Extended Brennan-Greenstadt Corpus—was released later by the same authors. To generate this dataset, the authors conducted a large-scale survey by recruiting participants from Amazon Mechanical Turk (MTurk). MTurk is a platform that allows researchers and scientists to conduct...