Book Image

Big Data Forensics: Learning Hadoop Investigations

By : Joe Sremack
Book Image

Big Data Forensics: Learning Hadoop Investigations

By: Joe Sremack

Overview of this book

Big Data forensics is an important type of digital investigation that involves the identification, collection, and analysis of large-scale Big Data systems. Hadoop is one of the most popular Big Data solutions, and forensically investigating a Hadoop cluster requires specialized tools and techniques. With the explosion of Big Data, forensic investigators need to be prepared to analyze the petabytes of data stored in Hadoop clusters. Understanding Hadoop’s operational structure and performing forensic analysis with court-accepted tools and best practices will help you conduct a successful investigation. Discover how to perform a complete forensic investigation of large-scale Hadoop clusters using the same tools and techniques employed by forensic experts. This book begins by taking you through the process of forensic investigation and the pitfalls to avoid. It will walk you through Hadoop's internals and architecture, and you will discover what types of information Hadoop stores and how to access that data. You will learn to identify Big Data evidence using techniques to survey a live system and interview witnesses. After setting up your own Hadoop system, you will collect evidence using techniques such as forensic imaging and application-based extractions. You will analyze Hadoop evidence using advanced tools and techniques to uncover events and statistical information. Finally, data visualization and evidence presentation techniques are covered to help you properly communicate your findings to any audience.
Table of Contents (10 chapters)
9
Index

Conventions

In this book, you will find a number of styles of text that distinguish between different kinds of information. Here are some examples of these styles, and an explanation of their meaning.

Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows:

"The following command collects the /dev/sda1 volume, stores it in a file called sda1.img".

A block of code is set as follows:

hdfs dfs -put ./testFile.txt /home/hadoopFile.txt
hdfs dfs –get /home/hadoopFile.txt ./testFile_copy.txt
md5sum testFile.txt
md5sum testFile_copy.txt

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

hdfs dfs -put ./testFile.txt /home/hadoopFile.txt
hdfs dfs –get /home/hadoopFile.txt ./testFile_copy.txt
md5sum testFile.txt
md5sum testFile_copy.txt

Any command-line input or output is written as follows:

#!/bin/bash
hive -e "show tables;" > hiveTables.txt
for line in $(cat hiveTables.txt) ;
do
hive -hiveconf tablename=$line -f tableExport.hql > ${line}.txt
done

New terms and important words are shown in bold. Words that you see on the screen, in menus or dialog boxes for example, appear in the text like this: "Enter the Case Number and Examiner information, and click Next."

Note

Warnings or important notes appear in a box like this.

Tip

Tips and tricks appear like this.