You are here

Bayesian Logical Data Analysis for the Physical Sciences

Phil Gregory
Publisher: 
Cambridge University Press
Publication Date: 
2005
Number of Pages: 
468
Format: 
Hardcover
Price: 
70.00
ISBN: 
0-521-84150-X
Category: 
Textbook
We do not plan to review this book.

Preface page xiii

Software support xv

Acknowledgements xvii

1 Role of probability theory in science 1

1.1 Scientific inference 1

1.2 Inference requires a probability theory 2

1.2.1 The two rules for manipulating probabilities 4

1.3 Usual form of Bayes’ theorem 5

1.3.1 Discrete hypothesis space 5

1.3.2 Continuous hypothesis space 6

1.3.3 Bayes’ theorem – model of the learning process 7

1.3.4 Example of the use of Bayes’ theorem 8

1.4 Probability and frequency 10

1.4.1 Example: incorporating frequency information 11

1.5 Marginalization 12

1.6 The two basic problems in statistical inference 15

1.7 Advantages of the Bayesian approach 16

1.8 Problems 17

2 Probability theory as extended logic 21

2.1 Overview 21

2.2 Fundamentals of logic 21

2.2.1 Logical propositions 21

2.2.2 Compound propositions 22

2.2.3 Truth tables and Boolean algebra 22

2.2.4 Deductive inference 24

2.2.5 Inductive or plausible inference 25

2.3 Brief history 25

2.4 An adequate set of operations 26

2.4.1 Examination of a logic function 27

2.5.1 The desiderata of Bayesian probability theory 30

2.5.2 Development of the product rule 30

2.5.3 Development of sum rule 34

2.5.4 Qualitative properties of product and sum rules 36

2.6 Uniqueness of the product and sum rules 37

2.7 Summary 39

2.8 Problems 39

3 The how-to of Bayesian inference 41

3.1 Overview 41

3.2 Basics 41

3.3 Parameter estimation 43

3.4 Nuisance parameters 45

3.5 Model comparison and Occam’s razor 45

3.6 Sample spectral line problem 50

3.6.1 Background information 50

3.7 Odds ratio 52

3.7.1 Choice of prior pðTjM1; IÞ 53

3.7.2 Calculation of pðDjM1; T; IÞ 55

3.7.3 Calculation of pðDjM2; IÞ 58

3.7.4 Odds, uniform prior 58

3.7.5 Odds, Jeffreys prior 58

3.8 Parameter estimation problem 59

3.8.1 Sensitivity of odds to Tmax 59

3.9 Lessons 61

3.10 Ignorance priors 63

3.11 Systematic errors 65

3.11.1 Systematic error example 66

3.12 Problems 69

4 Assigning probabilities 72

4.1 Introduction 72

4.2 Binomial distribution 72

4.2.1 Bernoulli’s law of large numbers 75

4.2.2 The gambler’s coin problem 75

4.2.3 Bayesian analysis of an opinion poll 77

4.3 Multinomial distribution 79

4.4 Can you really answer that question? 80

4.5 Logical versus causal connections 82

4.6 Exchangeable distributions 83

4.7 Poisson distribution 85

4.7.1 Bayesian and frequentist comparison 87

4.8 Constructing likelihood functions 89

4.8.1 Deterministic model 90

4.8.2 Probabilistic model 91

4.9 Summary 93

5 Frequentist statistical inference 96

5.1 Overview 96

5.2 The concept of a random variable 96

5.3 Sampling theory 97

5.4 Probability distributions 98

5.5 Descriptive properties of distributions 100

5.5.1 Relative line shape measures for distributions 101

5.5.2 Standard random variable 102

5.5.3 Other measures of central tendency and dispersion 103

5.5.4 Median baseline subtraction 104

5.6 Moment generating functions 105

5.7 Some discrete probability distributions 107

5.7.1 Binomial distribution 107

5.7.2 The Poisson distribution 109

5.7.3 Negative binomial distribution 112

5.8 Continuous probability distributions 113

5.8.1 Normal distribution 113

5.8.2 Uniform distribution 116

5.8.3 Gamma distribution 116

5.8.4 Beta distribution 117

5.8.5 Negative exponential distribution 118

5.9 Central Limit Theorem 119

5.10 Bayesian demonstration of the Central Limit Theorem 120

5.11 Distribution of the sample mean 124

5.11.1 Signal averaging example 125

5.12 Transformation of a random variable 125

5.13 Random and pseudo-random numbers 127

5.13.1 Pseudo-random number generators 131

5.13.2 Tests for randomness 132

5.14 Summary 136

5.15 Problems 137

6 What is a statistic? 139

6.1 Introduction 139

6.2 The 2 distribution 14

6.3 Sample variance S2 143

6.4 The Student’s t distribution 147

6.5 F distribution (F-test) 150

6.6 Confidence intervals 152

6.6.1 Variance 2 known 152

6.6.2 Confidence intervals for , unknown variance 156

6.6.3 Confidence intervals: difference of two means 158

6.6.4 Confidence intervals for 2 159

6.6.5 Confidence intervals: ratio of two variances 159

6.7 Summary 160

6.8 Problems 161

7 Frequentist hypothesis testing 162

7.1 Overview 162

7.2 Basic idea 162

7.2.1 Hypothesis testing with the 2 statistic 163

7.2.2 Hypothesis test on the difference of two means 167

7.2.3 One-sided and two-sided hypothesis tests 170

7.3 Are two distributions the same? 172

7.3.1 Pearson 2 goodness-of-fit test 173

7.3.2 Comparison of two-binned data sets 177

7.4 Problem with frequentist hypothesis testing 177

7.4.1 Bayesian resolution to optional stopping problem 179

7.5 Problems 181

8 Maximum entropy probabilities 184

8.1 Overview 184

8.2 The maximum entropy principle 185

8.3 Shannon’s theorem 186

8.4 Alternative justification of MaxEnt 187

8.5 Generalizing MaxEnt 190

8.5.1 Incorporating a prior 190

8.5.2 Continuous probability distributions 191

8.6 How to apply the MaxEnt principle 191

8.6.1 Lagrange multipliers of variational

calculus 191

8.7 MaxEnt distributions 192

8.7.1 General properties 192

8.7.2 Uniform distribution 194

8.7.3 Exponential distribution 195

8.7.4 Normal and truncated Gaussian distributions 197

8.7.5 Multivariate Gaussian distribution 202

8.8 MaxEnt image reconstruction 203

8.8.1 The kangaroo justification 203

8.8.2 MaxEnt for uncertain constraints 206

8.9 Pixon multiresolution image reconstruction 208

8.10 Problems 211

9 Bayesian inference with Gaussian errors 212

9.1 Overview 212

9.2 Bayesian estimate of a mean 212

9.2.1 Mean: known noise 213

9.2.2 Mean: known noise, unequal 217

9.2.3 Mean: unknown noise 218

9.2.4 Bayesian estimate of 224

9.3 Is the signal variable? 227

9.4 Comparison of two independent samples 228

9.4.1 Do the samples differ? 230

9.4.2 How do the samples differ? 233

9.4.3 Results 233

9.4.4 The difference in means 236

9.4.5 Ratio of the standard deviations 237

9.4.6 Effect of the prior ranges 239

9.5 Summary 240

9.6 Problems 241

10 Linear model fitting (Gaussian errors) 243

10.1 Overview 243

10.2 Parameter estimation 244

10.2.1 Most probable amplitudes 249

10.2.2 More powerful matrix formulation 253

10.3 Regression analysis 256

10.4 The posterior is a Gaussian 257

10.4.1 Joint credible regions 260

10.5 Model parameter errors 264

10.5.1 Marginalization and the covariance matrix 264

10.5.2 Correlation coefficient 268

10.5.3 More on model parameter errors 272

10.6 Correlated data errors 273

10.7 Model comparison with Gaussian posteriors 275

10.8 Frequentist testing and errors 279

10.8.1 Other model comparison methods 281

10.9 Summary 283

10.10 Problems 284

11 Nonlinear model fitting 287

11.1 Introduction 287

11.2 Asymptotic normal approximation 288

11.3 Laplacian approximations 291

11.3.1 Bayes factor 291

11.3.2 Marginal parameter posteriors 293

11.4 Finding the most probable parameters 294

11.4.1 Simulated annealing 296

11.4.2 Genetic algorithm 297

11.5 Iterative linearization 298

11.5.1 Levenberg–Marquardt method 300

11.5.2 Marquardt’s recipe 301

11.6 Mathematica example 302

11.6.1 Model comparison 304

11.6.2 Marginal and projected distributions 306

11.7 Errors in both coordinates 307

11.8 Summary 309

11.9 Problems 309

12 Markov chain Monte Carlo 312

12.1 Overview 312

12.2 Metropolis–Hastings algorithm 313

12.3 Why does Metropolis–Hastings work? 319

12.4 Simulated tempering 321

12.5 Parallel tempering 321

12.6 Example 322

12.7 Model comparison 326

12.8 Towards an automated MCMC 330

12.9 Extrasolar planet example 331

12.9.1 Model probabilities 335

12.9.2 Results 337

12.10 MCMC robust summary statistic 342

12.11 Summary 346

12.12 Problems 349

13 Bayesian revolution in spectral analysis 352

13.1 Overview 352

13.2 New insights on the periodogram 352

13.2.1 How to compute pðfjD; IÞ 356

13.3 Strong prior signal model 358

13.4 No specific prior signal model 360

13.4.1 X-ray astronomy example 362

13.4.2 Radio astronomy example 363

13.5 Generalized Lomb–Scargle periodogram 365

13.5.1 Relationship to Lomb–Scargle periodogram 367

13.5.2 Example 367

13.6 Non-uniform sampling 370

13.7 Problems 373

14 Bayesian inference with Poisson sampling 376

14.1 Overview 376

14.2 Infer a Poisson rate 377

14.2.1 Summary of posterior 378

14.3 Signal þ known background 379

14.4 Analysis of ON/OFF measurements 380

14.4.1 Estimating the source rate 381

14.4.2 Source detection question 384

14.5 Time-varying Poisson rate 386

14.6 Problems 388

Appendix A Singular value decomposition 389

Appendix B Discrete Fourier Transforms 392

B.1 Overview 392

B.2 Orthogonal and orthonormal functions 392

B.3 Fourier series and integral transform 394

B.3.1 Fourier series 395

B.3.2 Fourier transform 396

B.4 Convolution and correlation 398

B.4.1 Convolution theorem 399

B.4.2 Correlation theorem 400

B.4.3 Importance of convolution in science 401

B.5 Waveform sampling 403

B.6 Nyquist sampling theorem 404

B.6.1 Astronomy example 406

B.7 Discrete Fourier Transform 407

B.7.1 Graphical development 407

B.7.2 Mathematical development of the DFT 409

B.7.3 Inverse DFT 410

B.8 Applying the DFT 411

B.8.1 DFT as an approximate Fourier transform 411

B.8.2 Inverse discrete Fourier transform 413

B.9 The Fast Fourier Transform 415

B.10 Discrete convolution and correlation 417

B.10.1 Deconvolving a noisy signal 418

B.10.2 Deconvolution with an optimal Weiner filter 420

B.10.3 Treatment of end effects by zero padding 421

B.11 Accurate amplitudes by zero padding 422

B.12 Power-spectrum estimation 424

B.12.1 Parseval’s theorem and power spectral density 424

B.12.2 Periodogram power-spectrum estimation 425

B.12.3 Correlation spectrum estimation 426

B.13 Discrete power spectral density estimation 428

B.13.1 Discrete form of Parseval’s theorem 428

B.13.2 One-sided discrete power spectral density 429

B.13.3 Variance of periodogram estimate 429

B.13.4 Yule’s stochastic spectrum estimation model 431

B.13.5 Reduction of periodogram variance 431

B.14 Problems 432

Appendix C Difference in two samples 434

C.1 Outline 434

C.2 Probabilities of the four hypotheses 434

C.2.1 Evaluation of pðC; SjD1;D2; IÞ 434

C.2.2 Evaluation of pðC; SjD1;D2; IÞ 436

C.2.3 Evaluation of pðC; SjD1;D2; IÞ 438

C.2.4 Evaluation of pðC; SjD1;D2; IÞ 439

C.3 The difference in the means 439

C.3.1 The two-sample problem 440

C.3.2 The Behrens–Fisher problem 441

C.4 The ratio of the standard deviations 442

C.4.1 Estimating the ratio, given the means are the same 442

C.4.2 Estimating the ratio, given the means are different 443

Appendix D Poisson ON/OFF details 445

D.1 Derivation of pðsjNon; IÞ 445

D.1.1 Evaluation of Num 446

D.1.2 Evaluation of Den 447

D.2 Derivation of the Bayes factor Bfsþb;bg 448

Appendix E Multivariate Gaussian from maximum entropy 450

References 455

Index 461