Preface page ix
Guide to the chapters xi
Acknowledgment of support xii
Part I Introduction to the four themes 1
1 Statistics L. Pachter and B. Sturmfels 3
1.1 Statistical models for discrete data 4
1.2 Linear models and toric models 9
1.3 Expectation Maximization 17
1.4 Markov models 24
1.5 Graphical models 33
2 Computation L. Pachter and B. Sturmfels 43
2.1 Tropical arithmetic and dynamic programming 44
2.2 Sequence alignment 49
2.3 Polytopes 59
2.4 Trees and metrics 67
2.5 Software 75
3 Algebra L. Pachter and B. Sturmfels 85
3.1 Varieties and Gr¨obner bases 86
3.2 Implicitization 94
3.3 Maximum likelihood estimation 102
3.4 Tropical geometry 109
3.5 The tree of life and other tropical varieties 117
4 Biology L. Pachter and B. Sturmfels 125
4.1 Genomes 126
4.2 The data 132
4.3 The problems 137
4.4 Statistical models for a biological sequence 141
4.5 Statistical models of mutation 147
v
© Cambridge University Press www.cambridge.org
Cambridge University Press
0521857007 - Algebraic Statistics for Computational Biology
Edited by Lior Pachter and Bernd Sturmfels
Table of Contents
More information
vi
Part II Studies on the four themes 161
5 Parametric Inference R. Mihaescu 165
5.1 Tropical sum-product decompositions 166
5.2 The polytope propagation algorithm 169
5.3 Algorithm complexity 173
5.4 Specialization of parameters 177
6 Polytope Propagation on Graphs M. Joswig 181
6.1 Polytopes from directed acyclic graphs 181
6.2 Specialization to hidden Markov models 185
6.3 An implementation in polymake 186
6.4 Returning to our example 191
7 Parametric Sequence Alignment
C. Dewey and K. Woods 193
7.1 Few alignments are optimal 193
7.2 Polytope propagation for alignments 195
7.3 Retrieving alignments from polytope vertices 199
7.4 Biologically correct alignments 202
8 Bounds for Optimal Sequence Alignment
S. Elizalde and F. Lam 206
8.1 Alignments and optimality 206
8.2 Geometric interpretation 208
8.3 Known bounds 211
8.4 The square root conjecture 212
9 Inference Functions S. Elizalde 215
9.1 What is an inference function? 215
9.2 The few inference functions theorem 217
9.3 Inference functions for sequence alignment 220
10 Geometry of Markov Chains E. Kuo 226
10.1 Viterbi sequences 226
10.2 Two- and three-state Markov chains 229
10.3 Markov chains with many states 231
10.4 Fully observed Markov models 233
11 Equations Defining Hidden Markov Models
N. Bray and J. Morton 237
11.1 The hidden Markov model 237
11.2 Gr¨obner bases 238
11.3 Linear algebra 240
11.4 Combinatorially described invariants 247
Contents
© Cambridge University Press www.cambridge.org
Cambridge University Press
0521857007 - Algebraic Statistics for Computational Biology
Edited by Lior Pachter and Bernd Sturmfels
Table of Contents
More information
vii
12 The EM Algorithm for Hidden Markov Models
I. B. Hallgr´ımsd´ottir, R. A. Milowski and J. Yu 250
12.1 The EM algorithm for hidden Markov models 250
12.2 An implementation of the Baum–Welch algorithm 254
12.3 Plots of the likelihood surface 257
12.4 The EM algorithm and the gradient of the likelihood 261
13 Homology Mapping with Markov Random Fields A. Caspi 264
13.1 Genome mapping 264
13.2 Markov random fields 267
13.3 MRFs in homology assignment 270
13.4 Tractable MAP inference in a subclass of MRFs 273
13.5 The Cystic Fibrosis Transmembrane Regulator 276
14 Mutagenetic Tree Models N. Beerenwinkel and M. Drton 278
14.1 Accumulative evolutionary processes 278
14.2 Mutagenetic trees 279
14.3 Algebraic invariants 282
14.4 Mixture models 287
15 Catalog of Small Trees
M. Casanellas, L. D. Garcia, and S. Sullivant 291
15.1 Notation and conventions 291
15.2 Fourier coordinates 295
15.3 Description of website features 297
15.4 Example 298
15.5 Using the invariants 303
16 The Strand Symmetric Model
M. Casanellas and S. Sullivant 305
16.1 Matrix-valued Fourier transform 306
16.2 Invariants for the 3-taxa tree 310
16.3 G-tensors 314
16.4 Extending invariants 318
16.5 Reduction to K1,3 319
17 Extending Tree Models to Splits Networks D. Bryant 322
17.1 Trees, splits and splits networks 322
17.2 Distance-based models for trees and splits graphs 325
17.3 A graphical model on a splits network 326
17.4 Group-based mutation models 327
17.5 Group-based models for trees and splits 330
17.6 A Fourier calculus for splits networks 332
Contents
© Cambridge University Press www.cambridge.org
Cambridge University Press
0521857007 - Algebraic Statistics for Computational Biology
Edited by Lior Pachter and Bernd Sturmfels
Table of Contents
More information
viii
18 Small Trees and Generalized Neighbor-Joining
M. Contois and D. Levy 335
18.1 From alignments to dissimilarity 335
18.2 From dissimilarity to trees 337
18.3 The need for exact solutions 342
18.4 Jukes–Cantor triples 344
19 Tree Construction using Singular Value Decomposition
N. Eriksson 347
19.1 The general Markov model 347
19.2 Flattenings and rank conditions 348
19.3 Singular Value Decomposition 351
19.4 Tree-construction algorithm 352
19.5 Performance analysis 355
20 Applications of Interval Methods to Phylogenetics
R. Sainudiin and R. Yoshida 359
20.1 Brief introduction to interval analysis 360
20.2 Enclosing the likelihood of a compact set of trees 366
20.3 Global optimization 366
20.4 Applications to phylogenetics 371
21 Analysis of Point Mutations in Vertebrate Genomes
J. Al-Aidroos and S. Snir 375
21.1 Estimating mutation rates 375
21.2 The ENCODE data 378
21.3 Synonymous substitutions 379
21.4 The rodent problem 381
22 Ultra-Conserved Elements in Vertebrate and Fly Genomes
M. Drton, N. Eriksson and G. Leung 387
22.1 The data 387
22.2 Ultra-conserved elements 390
22.3 Biology of ultra-conserved elements 392
22.4 Statistical significance of ultra-conservation 400
References 403
Index 418