Preface.

**1. Introduction.**

1.1 Overview.

1.2 Definition.

1.3 Preparation.

1.3.1 Overview.

1.3.2 Accessing tabular data.

1.3.3 Accessing unstructured data.

1.3.4 Understanding the variables and observations.

1.3.5 Data cleaning.

1.3.6 Transformation.

1.3.7 Variable reduction.

1.3.8 Segmentation.

1.3.9 Preparing data to apply.

1.4 Analysis.

1.4.1 Data mining tasks.

1.4.2 Optimization.

1.4.3 Evaluation.

1.4.4 Model forensics.

1.5 Deployment .

1.6 Outline of book .

1.6.1 Overview.

1.6.2 Data visualization.

1.6.3 Clustering.

1.6.4 Predictive analytics.

1.6.5 Applications.

1.6.6 Software.

1.7 Summary.

1.8 Further reading .

**2. Data visualization.**

2.1 Overview.

2.2 Visualization design principles.

2.2.1 General principles.

2.2.2 Graphics design.

2.2.3 Anatomy of a graph.

2.3 Tables.

2.3.1 Simple tables.

2.3.2 Summary tables.

2.3.3 Two-way contingency tables.

2.3.4 Supertables .

2.4 Univariate data visualization.

2.4.1 Bar chart.

2.4.2 Histograms.

2.4.3 Frequency polygram.

2.4.4 Box plots.

2.4.5 Dot plot .

2.4.6 Stem-and-leaf plot .

2.4.7 Quantile plot.

2.4.8 Q-Q plot.

2.5 Bivariate data visualization.

2.5.1 Scatterplot.

2.6 Multivariate data visualization.

2.6.1 Histogram matrix.

2.6.2 Scatterplot matrix.

2.6.3 Multiple box plot.

2.6.4 Trellis plot.

2.7 Visualizing groups.

2.7.1 Dendrograms.

2.7.2 Decision trees.

2.7.3 Cluster image maps.

2.8 Dynamic techniques.

2.8.1 Data brushing.

2.8.2 Nearness selection.

2.8.3 Sorting and rearranging.

2.8.4 Searching and filtering.

2.9 Summary.

2.10 Further reading.

**3. Clustering.**

3.1 Overview.

3.2 Distance measures.

3.2.1 Overview.

3.2.2 Numeric distance measures.

3.2.3 Binary distance measures.

3.3.4 Mixed variables.

3.3.5 Others measures.

3.3 Agglomerative hierarchical clustering.

3.3.1 Overview.

3.3.2 Single linkage.

3.3.3 Complete linkage.

3.2.4 Average linkage.

3.3.5 Other methods.

3.3.6 Selecting groups.

3.4 Partitioned-based clustering .

3.4.1 Overview.

3.4.2 k-means.

3.4.3 Worked example.

3.4.4 Miscellaneous partitioned-based clustering.

3.5 Fuzzy clustering.

3.5.1 Overview.

3.5.2 Fuzzy k-means.

3.5.3 Worked examples.

3.6 Summary.

3.7 Further reading.

**4. Predictive analytics.**

4.1 Overview.

4.1.1 Predictive modeling.

4.1.2 Testing model accuracy.

4.1.3 Evaluating regression models’ predictive accuracy.

4.1.4 Evaluating classification models’ predictive accuracy.

4.1.5 Evaluating binary models’ predictive accuracy.

4.1.6 ROC charts.

4.1.7 Lift chart.

4.2 Principal component analysis.

4.2.1 Overview.

4.2.2 Principal components.

4.2.3 Generating principal components.

4.2.4 Interpretation of principal components.

4.3 Multiple linear regression.

4.3.1 Overview.

4.3.2 Generating models.

4.3.3 Prediction.

4.3.4 Analysis of residuals.

4.3.5 Standard error.

4.3.6 Coefficient of multiple determination.

4.3.7 Testing the model significance.

4.3.8 Selecting and transforming variables.

4.4 Discriminant analysis.

4.4.1 Overview.

4.4.2 Discriminant function.

4.4.3 Discriminant analysis example.

4.5 Logistic regression.

4.5.1 Overview.

4.5.2 Logistic regression formula.

4.5.3 Estimating coefficients.

4.5.4 Assessing and optimizing the results.

4.6 Naïve Bayes classifiers.

4.6.1 Overview.

4.6.2 Bayes theorem and the independence assumption.

4.6.3 Independence assumption.

4.6.4 Classification process.

4.7 Summary.

4.8 Further reading.

**5. Applications.**

5.1 Overview.

5.2 Sales and marketing.

5.3 Industry-specific data mining.

5.3.1 Finance.

5.3.2 Insurance.

5.3.3 Retail.

5.3.4 Telecommunications.

5.3.5 Manufacturing.

5.3.6 Entertainment.

5.3.7 Government.

5.3.8 Pharmaceuticals.

5.3.9 Healthcare.

5.4 MicroRNA data analysis case study.

5.4.1 Defining the problem.

5.4.2 Preparing the data.

5.4.3 Analysis.

5.5 Credit scoring case study.

5.5.1 Defining the problem.

5.5.2 Preparing the data.

5.5.3 Analysis.

5.5.4 Deployment.

5.6 Data mining non-tabular data.

5.6.1 Overview.

5.6.2 Data mining chemical data.

5.6.3 Data mining text.

5.12 Further reading.

Appendix A. Matrices.

A.1 Overview of matrices.

A.2 Matrix addition.

A.3 Matrix multiplication.

A.4 Transpose of a matrix.

A.4 Inverse of a matrix.

Appendix B. Software.

B.1 Software overview.

B.1.1 Software objectives.

B.1.2 Access and installation.

B.1.3 User interface overview.

B.2 Data preparation.

B.2.1 Overview.

B.2.2 Reading in data.

B.2.3 Searching the data.

B.2.4 Variable characterization.

B.2.5 Removing observations and variables.

B.2.6 Cleaning the data.

B.2.7 Transforming the data.

B.2.8 Segmentation.

B.2.9 Principal component analysis.

B.3 Tables and graphs.

B.3.1 Overview.

B.3.2 Contingency tables.

B.3.3 Summary tables.

B.3.4 Graphs.

B.3.5 Graph matrices.

B.4 Statistics.

B.4.1 Overview.

B.4.2 Descriptive statistics.

B.4.3 Confidence intervals.

B.4.4 Hypothesis tests.

B.4.5 Chi-square test.

B.4.6 ANOVA.

B.4.7 Comparative statistics.

B.5 Grouping.

B.5.1 Overview.

B.5.2 Clustering.

B.5.3 Associative rules.

B.5.4 Decision trees.

B.6 Prediction.

B.6.1 Overview.

B.6.2 Linear regression.

B.6.3 Discriminant analysis.

B.6.4 Logistic regression.

B.6.5 Naïve Bayes.

B.6.6 kNN.

B.6.7 CART.

B.6.8 Neural networks.

B.6.9 Apply model.

Bibliography.

Index.