-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating

10 Machine Learning Blueprints You Should Know for Cybersecurity
By :

In this section, we will focus on naïve methods for detecting bot-generated text. We will first create our own dataset, extract features, and then apply machine learning models to determine whether a particular text is machine-generated or not.
The task we will focus on is detecting bot-generated fake news. However, the concepts and techniques we will learn are fairly generic and can be applied to parallel tasks such as detecting bot-generated tweets, reviews, posts, and so on. As such a dataset is not readily available to the public, we will create our own.
How are we creating our dataset? We will use the News Aggregator dataset (https://archive.ics.uci.edu/ml/datasets/News+Aggregator) from the UCI Dataset Repository. The dataset contains a set of news articles (that is, links to the articles on the web). We will scrape these articles, and these are our human-generated articles. Then, we will use the article title as a prompt...