-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating

10 Machine Learning Blueprints You Should Know for Cybersecurity
By :

In this section, we will try to understand some review data and check whether there are any differences between genuine and fake reviews. We will use the Amazon fake reviews dataset that Amazon has published on Kaggle. It is a set of around 20,000 reviews with associated labels (real or fake) as labeled by domain experts at Amazon.
We will first load up the data and take a first pass over it to understand the features and their distribution.
We begin by importing the necessary libraries:
import numpy as np import pandas as pd import matplotlib.pyplot as plt
We will then read the reviews
data. Although it is a text file, it is structured and therefore can be read with the read_csv
function in Pandas:
reviews_df = pd.read_csv("amazon_reviews.txt", sep="\t") reviews_df.head()
This is what the output should look like:
Figure 4.1 – A glimpse of the reviews dataset
...