Book Image

Big Data Forensics: Learning Hadoop Investigations

Book Image

Big Data Forensics: Learning Hadoop Investigations

Overview of this book

Big Data forensics is an important type of digital investigation that involves the identification, collection, and analysis of large-scale Big Data systems. Hadoop is one of the most popular Big Data solutions, and forensically investigating a Hadoop cluster requires specialized tools and techniques. With the explosion of Big Data, forensic investigators need to be prepared to analyze the petabytes of data stored in Hadoop clusters. Understanding Hadoop’s operational structure and performing forensic analysis with court-accepted tools and best practices will help you conduct a successful investigation. Discover how to perform a complete forensic investigation of large-scale Hadoop clusters using the same tools and techniques employed by forensic experts. This book begins by taking you through the process of forensic investigation and the pitfalls to avoid. It will walk you through Hadoop's internals and architecture, and you will discover what types of information Hadoop stores and how to access that data. You will learn to identify Big Data evidence using techniques to survey a live system and interview witnesses. After setting up your own Hadoop system, you will collect evidence using techniques such as forensic imaging and application-based extractions. You will analyze Hadoop evidence using advanced tools and techniques to uncover events and statistical information. Finally, data visualization and evidence presentation techniques are covered to help you properly communicate your findings to any audience.
Table of Contents (10 chapters)
9
Index

Chapter 1. Starting Out with Forensic Investigations and Big Data

Big Data forensics is a new type of forensics, just as Big Data is a new way of solving the challenges presented by large, complex data. Thanks to the growth in data and the increased value of storing more data and analyzing it faster—Big Data solutions have become more common and more prominently positioned within organizations. As such, the value of Big Data systems has grown, often storing data used to drive organizational strategy, identify sales, and many different modes of electronic communication. The forensic value of such data is obvious: if the data is useful to an organization, then the data is valuable to an investigation of that organization. The information in a Big Data system is not only inherently valuable, but the data is most likely organized and analyzed in such a way to identify how the organization treated the data.

Big Data forensics is the forensic collection and analysis of Big Data systems. Traditional computer forensics typically focuses on more common sources of data, such as mobile devices and laptops. Big Data forensics is not a replacement for traditional forensics. Instead, Big Data forensics augments the existing forensics body of knowledge to handle the massive, distributed systems that require different forensic tools and techniques.

Traditional forensic tools and methods are not always well-suited for Big Data. The tools and techniques used in traditional forensics are most commonly designed for the collection and analysis of unstructured data (for example, e-mail and document files). Forensics of such data typically hinges on metadata and involves the calculation of an MD5 or SHA-1 checksum. With Big Data systems, the large volume of data and how the data is stored do not lend themselves well to traditional forensics. As such, alternative methods for collecting and analyzing such data are required.

This chapter covers the basics of forensic investigations, Big Data, and how Big Data forensics is unique. Some of the topics that are discussed include the following:

  • Goals of a forensic investigation
  • Forensic investigation methodology
  • Big Data – defined and described
  • Key differences between traditional forensics and Big Data forensics