Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • IBM SPSS Modeler Cookbook
  • Toc
  • feedback
IBM SPSS Modeler Cookbook

IBM SPSS Modeler Cookbook

By : Keith McCormick, Abbott
4.4 (20)
close
IBM SPSS Modeler Cookbook

IBM SPSS Modeler Cookbook

4.4 (20)
By: Keith McCormick, Abbott

Overview of this book

IBM SPSS Modeler is a data mining workbench that enables you to explore data, identify important relationships that you can leverage, and build predictive models quickly allowing your organization to base its decisions on hard data not hunches or guesswork. IBM SPSS Modeler Cookbook takes you beyond the basics and shares the tips, the timesavers, and the workarounds that experts use to increase productivity and extract maximum value from data. The authors of this book are among the very best of these exponents, gurus who, in their brilliant and imaginative use of the tool, have pushed back the boundaries of applied analytics. By reading this book, you are learning from practitioners who have helped define the state of the art. Follow the industry standard data mining process, gaining new skills at each stage, from loading data to integrating results into everyday business practices. Get a handle on the most efficient ways of extracting data from your own sources, preparing it for exploration and modeling. Master the best methods for building models that will perform well in the workplace. Go beyond the basics and get the full power of your data mining workbench with this practical guide.
Table of Contents (11 chapters)
close
10
Index

The IBM SPSS Modeler workbench

This book is about the data mining workbench variously known as Clementine, IBM SPSS Modeler. This and the other workbench-style data mining tools have played a crucial role in making data mining what it now is, that is, a business process (rather than a technical one). The importance of the workbench is twofold.

Firstly, the workbench plays down the technical side of data mining. It simplifies the use of technology through a user interface that allows the user almost always to ignore the deep technical details, whether this means the method of data access, the design of a graph, or the mechanism and tuning of data mining algorithms. Technical details are simplified, and where possible, universal default settings are used so that the users often need not see any options that reveal the underlying technology, let alone understand what they mean.

This is important because it allows business analysts to perform data mining—a business analyst is someone with expert business knowledge and general-purpose analytical knowledge. A business analyst need not have deep knowledge of data mining algorithms or mathematics, and it can even be a disadvantage to have this knowledge because technical details can distract from focusing on the business problem.

Secondly, the workbench records and highlights the way in which business knowledge has been used to analyze the data. This is why most data mining workbenches use a "visual workflow" approach; the workflow constitutes a record of the route from raw data to analysis, and it also makes it extremely easy to change this processing and re-use it in part or in full. Data mining is an interactive process of applying business and analytical knowledge to data, and the data mining workbench is designed to make this easy.

A brief history of the Clementine workbench

During the 1980s, the School of Cognitive and Computing Studies at the University of Sussex developed an Artificial Intelligence programming environment called Poplog. Used for teaching and research, Poplog was characterized by containing several different AI programming languages and many other AI-related packages, including machine-learning modules. From 1983, Poplog was marketed commercially by Systems Designers Limited (later SD-Scicon), and in 1989, a management buyout created a spin-off company called Integral Solutions Ltd (ISL) to market Poplog and related products. A stream of businesses developed within ISL, applying the machine-learning packages in Poplog to organizations' data, in order to understand and predict customer behavior.

In 1993, Colin Shearer (the then Development and Research Director at ISL) invented the Clementine data mining workbench, basing his designs around the data mining projects recently executed by the company and creating the first workbench modules using Poplog. ISL created a data mining division, led by Colin Shearer, to develop, productize, and market Clementine and its associated services; the initial members were Colin Shearer, Tom Khabaza, and David Watkins. This team used Poplog to develop the first version of Clementine, which was launched in June 1994.

Clementine Version 1 would be considered limited by today's standards; the only algorithms provided were decision trees and neural networks, and it had very limited access to databases. However, the fundamental design features of low technical burden on the user and a flexible visual record of the analysis were as much as they are today, and Clementine immediately attracted substantial commercial interest. New versions followed, approximately one major version per year, as shown in the table below. ISL was acquired by SPSS Inc. in December 1998, and SPSS Inc. was acquired by IBM in 2009.

Version

Major new features

1

Decision tree and neural network algorithms, limited database access, and Unix platforms only

2

New Kohonen network and linear regression algorithms, new web graph, improved data manipulation, and supernodes

3

ODBC database access, Unix, and Windows platforms

4

Association Rules and K-means clustering algorithms

5

Scripting, batch execution, external module interface, client-server architecture (Poplog client and C++ server), and the CRISP-DM project tool

6

Logistic regression algorithm, database pushback, and Clementine application templates

7

Java client including many new features, TwoStep clustering, and PCA/Factor analysis algorithms

8

Cluster browser and data audit

9

CHAID and Quest algorithms and interactive decision tree building

10

Anomaly detection and feature selection algorithms

11

Automated modeling, times series and decision list algorithms, and partial automation of data preparation

12

SVM, Bayesian and Cox regression algorithms, RFM, and variable importance charts

13

Automated clustering and data preparation, nearest neighbor algorithm, interactive rule building

14

Boosting and bagging, ensemble browsing, XML data

15

Entity analytics social network analysis, GLMM algorithm

Version 13 was renamed as PASW Modeler, and Version 14 as IBM SPSS Modeler. The selection of major new features described earlier is very subjective; every new version of Clementine included a large number of enhancements and new features. In particular, data manipulation, data access and export, visualization, and the user interface received a great deal of attention throughout. Perhaps the most significant new release was Version 7, where the Clementine client was completely rewritten in Java; this was designed by Sheri Gilley and Julian Clinton, and contained a large number of new features while retaining the essential character of the software. Another very important feature of Clementine from Version 6 onwards was database pushback, the ability to translate Clementine operations into SQL so that they could be executed directly by a database engine without extracting the data first; this was primarily the work of Niall McCarroll and Rob Duncan, and it gave Clementine an unusual degree of scalability compared to other data mining software.

In 1996, ISL collaborated with Daimler-Benz, NCR Teradata, and OHRA to form the "CRISP-DM" consortium, partly funded by a European Union R&D grant in order to create a new data mining methodology, CRISP-DM. The consortium consulted many organizations through its Special Interest Group and released CRISP-DM Version 1.0 in 1999. CRISP-DM has been integrated into the workbench since that time and has been very widely used, sufficiently to justify calling it the industry standard.

The core Clementine analytics are designed to handle structured data—numeric, coded, and string data of the sort typically found in relational databases. However, in Clementine Version 4, a prototype text mining module was produced in collaboration with Brighton University, although not released as a commercial product. In 2002, SPSS acquired LexiQuest, a text mining company, and integrated the LexiQuest text mining technology into a product called Text Mining for Clementine, an add-on module for Version 7. Text mining is accomplished in the workbench by extracting structured data from unstructured (free text) data, and then using the standard features of the workbench to analyze this.

bookmark search playlist font-size

Change the font size

margin-width

Change margin width

day-mode

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Delete Bookmark

Modal Close icon
Are you sure you want to delete it?
Cancel
Yes, Delete