You are here

Correspondence Analysis: Theory, Practice and New Strategies

Eric J. Beh and Rosaria Lombardo
Publisher: 
John Wiley
Publication Date: 
2014
Number of Pages: 
592
Format: 
Hardcover
Series: 
wiley Series in Probability and Statistics
Price: 
105.00
ISBN: 
9781119953241
Category: 
Monograph
We do not plan to review this book.

Foreword xv

Preface xvii

Part One Introduction 1

1 Data Visualisation 3

1.1 A Very Brief Introduction to Data Visualisation 3

1.1.1 A Very Brief History 3

1.1.2 Introduction to Visualisation Tools for Numerical Data 4

1.1.3 Introduction to Visualisation Tools for Univariate Categorical Data 6

1.2 Data Visualisation for Contingency Tables 10

1.2.1 Fourfold Displays 11

1.3 Other Plots 12

1.4 Studying Exposure to Asbestos 13

1.4.1 Asbestos and Irving J. Selikoff 13

1.4.2 Selikoff’s Data 17

1.4.3 Numerical Analysis of Selikoff’s Data 17

1.4.4 A Graphical Analysis of Selikoff’s Data 18

1.4.5 Classical Correspondence Analysis of Selikoff’s Data 20

1.4.6 Other Methods of Graphical Analysis 22

1.5 Happiness Data 25

1.6 Correspondence Analysis Now 29

1.6.1 A Bibliographic Taste 29

1.6.2 The Increasing Popularity of Correspondence Analysis 29

1.6.3 The Growth of the Correspondence Analysis Family Tree 32

1.7 Overview of the Book 34

1.8 R Code 35

References 36

2 Pearson’s Chi-Squared Statistic 44

2.1 Introduction 44

2.2 Pearson’s Chi-Squared Statistic 44

2.2.1 Notation 44

2.2.2 Measuring the Departure from Independence 45

2.2.3 Pearson’s Chi-Squared Statistic 47
2.2.4 Other ��2 Measures of Association 48

2.2.5 The Power Divergence Statistic 49

2.2.6 Dealing with the Sample Size 50

2.3 The Goodman--Kruskal Tau Index 51

2.3.1 Other Measures and Issues 52
2.4 The 2 × 2 Contingency Table 52

2.4.1 Yates’ Continuity Correction 53

2.5 Early Contingency Tables 54

2.5.1 The Impact of Adolph Quetelet 55

2.5.2 Gavarret’s (1840) Legitimate Children Data 58

2.5.3 Finley’s (1884) Tornado Data 58

2.5.4 Galton’s (1892) Fingerprint Data 59

2.5.5 Final Comments 61

2.6 R Code 61

2.6.1 Expectation and Variance of the Pearson Chi-Squared Statistic 61

2.6.2 Pearson’s Chi-Squared Test of Independence 62

2.6.3 The Cressie--Read Statistic 64

References 67

Part Two Correspondence Analysis of Two-Way Contingency Tables 71

3 Methods of Decomposition 73

3.1 Introduction 73

3.2 Reducing Multidimensional Space 73

3.3 Profiles and Cloud of Points 74

3.4 Property of Distributional Equivalence 79

3.5 The Triplet and Classical Reciprocal Averaging 79

3.5.1 One-Dimensional Reciprocal Averaging 80

3.5.2 Matrix Form of One-Dimensional Reciprocal Averaging 81

3.5.3 ��-Dimensional Reciprocal Averaging 83

3.5.4 Some Historical Comments 83

3.6 Solving the Triplet Using Eigen-Decomposition 84

3.6.1 The Decomposition 84

3.6.2 Example 85

3.7 Solving the Triplet Using Singular Value Decomposition 86

3.7.1 The Standard Decomposition 86

3.7.2 The Generalised Decomposition 88

3.8 The Generalised Triplet and Reciprocal Averaging 89

3.9 Solving the Generalised Triplet Using Gram--Schmidt Process 91

3.9.1 Ordered Categorical Variables and a priori Scores 91

3.9.2 On Finding Orthogonalised Vectors 92

3.9.3 A Recurrence Formulae Approach 94

3.9.4 Changing the Basis Vector 96

3.9.5 Generalised Correlations 97

3.10 Bivariate Moment Decomposition 100

3.11 Hybrid Decomposition 100

3.11.1 An Alternative Singly Ordered Approach 102

3.12 R Code 103

3.12.1 Eigen-Decomposition in R 103

3.12.2 Singular Value Decomposition in R 103

3.12.3 Singular Value Decomposition for Matrix Approximation 104

3.12.4 Generating Emerson’s Polynomials 106

3.13 A Preliminary Graphical Summary 109

3.14 Analysis of Analgesic Drugs 112

References 115

4 Simple Correspondence Analysis 120

4.1 Introduction 120

4.2 Notation 121

4.3 Measuring Departures from Complete Independence 122

4.3.1 The ‘Duplication Constant’ 123

4.3.2 Pearson Ratios 123

4.4 Decomposing the Pearson Ratio 124

4.5 Coordinate Systems 126

4.5.1 Standard Coordinates 126

4.5.2 Principal Coordinates 127

4.5.3 Biplot Coordinates 132

4.6 Distances 136

4.6.1 Distance from the Origin 136

4.6.2 Intra-Variable Distances and the ���� Metric 137

4.6.3 Inter-Variable Distances 138

4.7 Transition Formulae 140

4.8 Moments of the Principal Coordinates 141

4.8.1 The Mean of ������ 142

4.8.2 The Variance of ������ 142

4.8.3 The Skewness of ������ 143

4.8.4 The Kurtosis of ������ 143

4.8.5 Moments of the Asbestos Data 144

4.9 How Many Dimensions to Use? 145

4.10 R Code 147

4.11 Other Theoretical Issues 154

4.12 Some Applications of Correspondence Analysis 156

4.13 Analysis of a Mother’s Attachment to Her Child 158

References 165

5 Non-Symmetrical Correspondence Analysis 177

5.1 Introduction 177

5.2 The Goodman--Kruskal Tau Index 180

5.2.1 The Tau Index as a Measure of the Increase in Predictability 180

5.2.2 The Tau Index in the Context of ANOVA 182

5.2.3 The Sensitivity of �� 182

5.2.4 A Demonstration: Revisiting Selikoff’s Asbestos Data 185

5.3 Non-Symmetrical Correspondence Analysis 186

5.3.1 The Centred Column Profile Matrix 186

5.3.2 Decomposition of �� 187

5.4 The Coordinate Systems 188

5.4.1 Standard Coordinates 188

5.4.2 Principal Coordinates 189

5.4.3 Biplot Coordinates 193

5.5 Transition Formulae 197

5.5.1 Supplementary Points 198

5.5.2 Reconstruction Formulae 198

5.6 Moments of the Principal Coordinates 199

5.6.1 The Mean of ������ 199

5.6.2 The Variance of ������ 200

5.6.3 The Skewness of ������ 201

5.6.4 The Kurtosis of ������ 201

5.7 The Distances 201

5.7.1 Column Distances 201

5.7.2 Row Distances 203

5.8 Comparison with Simple Correspondence Analysis 204

5.9 R Code 204

5.10 Analysis of a Mother’s Attachment to Her Child 209

References 212

6 Ordered Correspondence Analysis 216

6.1 Introduction 216

6.2 Pearson’s Ratio and Bivariate Moment Decomposition 221

6.3 Coordinate Systems 222

6.3.1 Standard Coordinates 222

6.3.2 The Generalised Correlations 223

6.3.3 Principal Coordinates 225

6.3.4 Location, Dispersion and Higher Order Components 229

6.3.5 The Correspondence Plot and Generalised Correlations 230

6.3.6 Impact on the Choice of Scores 232

6.4 Artificial Data Revisited 233

6.4.1 On the Structure of the Association 233

6.4.2 A Graphical Summary of the Association 233

6.4.3 An Interpretation of the Axes and Components 234

6.4.4 The Impact of the Choice of Scores 235

6.5 Transition Formulae 236

6.6 Distance Measures 238

6.6.1 Distance from the Origin 238

6.6.2 Intra-Variable Distances 239

6.7 Singly Ordered Analysis 239

6.8 R Code 241

6.8.1 Generalised Correlations and Principal Inertias 241

6.8.2 Doubly Ordered Correspondence Analysis 245

References 248

7 Ordered Non-Symmetrical Correspondence Analysis 251

7.1 Introduction 251

7.2 General Considerations 252

7.2.1 Orthogonal Polynomials Instead of Singular Vectors 253

7.3 Doubly Ordered Non-Symmetrical Correspondence Analysis 254

7.3.1 Bivariate Moment Decomposition 254

7.3.2 Generalised Correlations in Bivariate Moment Decomposition 255

7.4 Singly Ordered Non-Symmetrical Correspondence Analysis 257

7.4.1 Hybrid Decomposition for an Ordered Predictor Variable 257

7.4.2 Hybrid Decomposition in the Case of Ordered Response Variables 258

7.4.3 Generalised Correlations in Hybrid Decomposition 258

7.5 Coordinate Systems for Ordered Non-Symmetrical Correspondence Analysis 259

7.5.1 Polynomial Plots for Doubly Ordered Non-Symmetrical Correspondence Analysis 260

7.5.2 Polynomial Biplot for Doubly Ordered Non-Symmetrical Correspondence Analysis 262

7.5.3 Polynomial Plot for Singly Ordered Non-Symmetrical Correspondence Analysis with an Ordered Predictor Variable 262

7.5.4 Polynomial Biplot for Singly Ordered Non-Symmetrical Correspondence Analysis with an Ordered Predictor Variable 263

7.5.5 Polynomial Plot for Singly Ordered Non-Symmetrical Correspondence Analysis with an Ordered Response Variable 264

7.5.6 Polynomial Biplot for Singly Ordered Non-Symmetrical Correspondence Analysis with an Ordered Response Variable 265

7.6 Tests of Asymmetric Association 265

7.7 Distances in Ordered Non-Symmetrical Correspondence Analysis 266

7.7.1 Distances in Doubly Ordered Non-Symmetrical Correspondence Analysis 267

7.7.2 Distances in Singly Ordered Non-Symmetrical Correspondence Analysis 269

7.8 Doubly Ordered Non-Symmetrical Correspondence of Asbestos Data 269

7.8.1 Trends 270

7.9 Singly Ordered Non-Symmetrical Correspondence Analysis of Drug Data 277

7.9.1 Predictability of Ordered Rows Given Columns 278

7.10 R Code for Ordered Non-Symmetrical Correspondence Analysis 283

References 300

8 External Stability and Confidence Regions 302

8.1 Introduction 302

8.2 On the Statistical Significance of a Point 303

8.3 Circular Confidence Regions for Classical Correspondence Analysis 304

8.4 Elliptical Confidence Regions for Classical Correspondence Analysis 306

8.4.1 The Information in the Optimal Correspondence Plot 306

8.4.2 The Information in the First Two Dimensions 308

8.4.3 Eccentricity of Elliptical Regions 309

8.4.4 Comparison of Confidence Regions 309

8.5 Confidence Regions for Non-Symmetrical Correspondence Analysis 311

8.5.1 Circular Regions in Non-Symmetrical Correspondence Analysis 312

8.5.2 Elliptical Regions in Non-Symmetrical Correspondence Analysis 312

8.6 Approximate ��-values and Classical Correspondence Analysis 313

8.6.1 Approximate ��-values Based on Confidence Circles 313

8.6.2 Approximate ��-values Based on Confidence Ellipses 314

8.7 Approximate ��-values and Non-Symmetrical Correspondence Analysis 315

8.8 Bootstrap Elliptical Confidence Regions 315

8.9 Ringrose’s Bootstrap Confidence Regions 316

8.9.1 Confidence Ellipses and Covariance Matrix 317

8.10 Confidence Regions and Selikoff’s Asbestos Data 318

8.11 Confidence Regions and Mother--Child Attachment Data 322

8.12 R Code 325

8.12.1 Calculating the Path of a Confidence Ellipse 326

8.12.2 Constructing Elliptical Regions in a Correspondence Plot 327

References 335

9 Variants of Correspondence Analysis 337

9.1 Introduction 337

9.2 Correspondence Analysis Using Adjusted Standardised Residuals 337

9.3 Correspondence Analysis Using the Freeman--Tukey Statistic 340

9.4 Correspondence Analysis of Ranked Data 342

9.5 R Code 343

9.5.1 Adjusted Standardised Residuals 343

9.5.2 Freeman--Tukey Statistic 349

9.6 The Correspondence Analysis Family 353

9.6.1 Detrended Correspondence Analysis 353

9.6.2 Canonical Correspondence Analysis 354

9.6.3 Inverse Correspondence Analysis 355

9.6.4 Ordered Correspondence Analysis 355

9.6.5 Grade Correspondence Analysis 355

9.6.6 Symbolic Correspondence Analysis 356

9.6.7 Correspondence Analysis of Proximity Data 356

9.6.8 Residual (Scaling) Correspondence Analysis 360

9.6.9 Log-Ratio Correspondence Analysis 362

9.6.10 Parametric Correspondence Analysis 364

9.6.11 Subset Correspondence Analysis 364

9.6.12 Foucart’s Correspondence Analysis 365

9.7 Other Techniques 365

References 366

Part Three Correspondence Analysis of Multi-Way Contingency Tables 373

10 Coding and Multiple Correspondence Analysis 375

10.1 Introduction to Coding 375

10.2 Coding Data 377

10.2.1 B-Splines 377

10.2.2 Crisp Coding 380

10.2.3 Fuzzy Coding 382

10.3 Coding Ordered Categorical Variables by Orthogonal Polynomials 382

10.4 Burt Matrix 384

10.5 An Introduction to Multiple Correspondence Analysis 386

10.6 Multiple Correspondence Analysis 388

10.6.1 Notation 388

10.6.2 Decomposition Methods 389

10.6.3 Coordinates, Transition Formulae and Adjusted Inertia 393

10.7 Variants of Multiple Correspondence Analysis 395

10.7.1 Joint Correspondence Analysis 396

10.7.2 Stacking and Concatenation 397

10.8 Ordered Multiple Correspondence Analysis 398

10.8.1 Orthogonal Polynomials in Multiple Correspondence Analysis 398

10.8.2 Hybrid Decomposition of Multiple Indicator Tables 399

10.8.3 Two Ordered Variables and Their Contingency Table 400

10.8.4 Test of Statistical Significance 401

10.8.5 Properties of Ordered Multiple Correspondence Analysis 403

10.8.6 Graphical Displays in Ordered Multiple Correspondence Analysis 404

10.9 Applications 405

10.9.1 Customer Satisfaction in Health Care Services 406

10.9.2 Two Quality Aspects 411

10.10 R Code 417

10.10.1 B-Spline Function 417

10.10.2 Crisp and Fuzzy Coding Using B-Splines in R 421

10.10.3 Crisp Coding and the Burt Table by Indicator Functions in R 425

10.10.4 Classical and Multiple Correspondence Analysis in R 428

References 444

11 Symmetrical and Non-Symmetrical Three-Way Correspondence Analysis 451

11.1 Introduction 451

11.2 Notation 453

11.3 Symmetric and Asymmetric Association in Three-Way Contingency Tables 454

11.4 Partitioning Three-Way Measures of Association 455

11.4.1 Partitioning Pearson’s Three-Way Statistic 457

11.4.2 Partitioning Marcotorchino’s and Gray--William’s Three-Way Indices 458

11.4.3 Marcotorchino’s Index 460

11.4.4 Partitioning the Three-Way Delta Index 461

11.4.5 Three-Way Delta Index 463

11.5 Formal Tests of Predictability 463

11.5.1 Testing Pearson’s Statistic 464

11.5.2 Testing the Marcotorchino’s Index 464

11.5.3 Testing the Delta Index 465

11.5.4 Discussion 465

11.6 Tucker3 Decomposition for Three-Way Tables 466

11.7 Correspondence Analysis of Three-Way Contingency Tables 467

11.7.1 Symmetrically Associated Variables 467

11.7.2 Asymmetrically Associated Variables 468

11.7.3 Additional Property 469

11.8 Modelling of Partial and Marginal Dependence 470

11.9 Graphical Representation 471

11.9.1 Interactive Plot 471

11.9.2 Interactive Biplot 472

11.9.3 Category Contribution 474

11.10 On the Application of Partitions 474

11.10.1 Olive Data: Partitioning the Asymmetric Association 474

11.10.2 Job Satisfaction Data: Partitioning the Asymmetric Association 476

11.11 On the Application of Three-Way Correspondence Analysis 477

11.11.1 Job Satisfaction and Three-Way Symmetrical Correspondence Analysis 477

11.11.2 Job Satisfaction and Three-Way Non-Symmetrical Correspondence Analysis 483

11.12 R Code 490

References 511

Part Four The Computation of Correspondence Analysis 517

12 Computing and Correspondence Analysis 519

12.1 Introduction 519

12.2 A Look Through Time 519

12.2.1 Pre-1990 519

12.2.2 From 1990 to 2000 520

12.2.3 The Early 2000s 522

12.3 The Impact of R 523

12.3.1 Overview of Correspondence Analysis in R 523

12.3.2 MASS 524

12.3.3 Nenadi´c and Greenacre’s (2007) ca 525

12.3.4 Murtagh (2005) 527

12.3.5 ade4 530

12.4 Some Stand-Alone Programs 533

12.4.1 JMP 533

12.4.2 SPSS 533

12.4.3 PAST 534

12.4.4 DtmVic5.6+ 535

References 540

Index 545