Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Clojure for Data Science
  • Toc
  • feedback
Clojure for Data Science

Clojure for Data Science

By : Garner
5 (4)
close
Clojure for Data Science

Clojure for Data Science

5 (4)
By: Garner

Overview of this book

The term “data science” has been widely used to define this new profession that is expected to interpret vast datasets and translate them to improved decision-making and performance. Clojure is a powerful language that combines the interactivity of a scripting language with the speed of a compiled language. Together with its rich ecosystem of native libraries and an extremely simple and consistent functional approach to data manipulation, which maps closely to mathematical formula, it is an ideal, practical, and flexible language to meet a data scientist’s diverse needs. Taking you on a journey from simple summary statistics to sophisticated machine learning algorithms, this book shows how the Clojure programming language can be used to derive insights from data. Data scientists often forge a novel path, and you’ll see how to make use of Clojure’s Java interoperability capabilities to access libraries such as Mahout and Mllib for which Clojure wrappers don’t yet exist. Even seasoned Clojure developers will develop a deeper appreciation for their language’s flexibility! You’ll learn how to apply statistical thinking to your own data and use Clojure to explore, analyze, and visualize it in a technically and statistically robust way. You can also use Incanter for local data processing and ClojureScript to present interactive visualisations and understand how distributed platforms such as Hadoop sand Spark’s MapReduce and GraphX’s BSP solve the challenges of data analysis at scale, and how to explain algorithms using those programming models. Above all, by following the explanations in this book, you’ll learn not just how to be effective using the current state-of-the-art methods in data science, but why such methods work so that you can continue to be productive as the field evolves into the future.
Table of Contents (12 chapters)
close
11
Index

Preface

 

"Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write."

 
 --H. G. Wells
 

"I have a great subject [statistics] to write upon, but feel keenly my literary incapacity to make it easily intelligible without sacrificing accuracy and thoroughness."

 
 --Sir Francis Galton

A web search for "data science Venn diagram" returns numerous interpretations of the skills required to be an effective data scientist (it appears that data science commentators love Venn diagrams). Author and data scientist Drew Conway produced the prototypical diagram back in 2010, putting data science at the intersection of hacking skills, substantive expertise (that is, subject domain understanding), and mathematics and statistics knowledge. Between hacking skills and substantive expertise—those practicing without strong mathematics and statistics knowledge—lies the "danger zone."

Five years on, as a growing number of developers seek to plug the data science skills' shortage, there's more need than ever for statistical and mathematical education to help developers out of this danger zone. So, when Packt Publishing invited me to write a book on data science suitable for Clojure programmers, I gladly agreed. In addition to appreciating the need for such a book, I saw it as an opportunity to consolidate much of what I had learned as CTO of my own Clojure-based data analytics company. The result is the book I wish I had been able to read before starting out.

Clojure for Data Science aims to be much more than just a book of statistics for Clojure programmers. A large reason for the spread of data science into so many diverse areas is the enormous power of machine learning. Throughout the book, I'll show how to use pure Clojure functions and third-party libraries to construct machine learning models for the primary tasks of regression, classification, clustering, and recommendation.

Approaches that scale to very large datasets, so-called "big data," are of particular interest to data scientists, because they can reveal subtleties that are lost in smaller samples. This book shows how Clojure can be used to concisely express jobs to run on the Hadoop and Spark distributed computation frameworks, and how to incorporate machine learning through the use of both dedicated external libraries and general optimization techniques.

Above all, this book aims to foster an understanding not just on how to perform particular types of analysis, but why such techniques work. In addition to providing practical knowledge (almost every concept in this book is expressed as a runnable example), I aim to explain the theory that will allow you to take a principle and apply it to related problems. I hope that this approach will enable you to effectively apply statistical thinking in diverse situations well into the future, whether or not you decide to pursue a career in data science.

bookmark search playlist font-size

Change the font size

margin-width

Change margin width

day-mode

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Delete Bookmark

Modal Close icon
Are you sure you want to delete it?
Cancel
Yes, Delete