# Combinatorics of Genome Rearrangements

Publisher:
MIT Press
Number of Pages:
288
Price:
40.00
ISBN:
9780262062824

One of the first questions one may have about Combinatorics of Genome Rearrangements by Guillaume Fertin et al is how to classify the book. To what extent is it a mathematics book or a biology book? What background is necessary to read it? Combinatorics is concerned with problems motivated by biology and uses vocabulary derived from biology, but otherwise is not about biology per se. The content of Combinatorics is primarily mathematics and computer science — combinatorial analysis and the efficiency of algorithms. The first chapter contains a quick introduction to the biology necessary to read the book and the mathematical prerequisites are minimal.

Although the book is largely self-contained, it can be difficult to follow. The presentation is condensed; the book reads more like an outline or a reference than a textbook. Combinatorics is long on definitions but short on examples.

One of the primary goals of applying combinatorial reasoning to genetics is to quantify the degree of similarity between two genetic sequences in a biologically meaningful way. One way to compare two sequences is to place the sequences side by side and simply count the number of locations in which the sequences differ. For example, consider the following two sequences.

ACATTAGTCAGTCGCGCTCGCTGC
ACACGCTCGCGCTGACTGATTTGC

The sequences differ in 15 positions, as indicated by underlined characters. Said another way, fifteen changes would be necessary to transform one sequence into the other. But when viewed from a different perspective, the sequences only differ in one way: the middle segments is reversed.

ACGTAGTCAGTCGCGCTCGTGC
ACTGCTCGCGCTGACTGATTGC

Because genetic sequences are sometimes reversed during reproduction, the difference between the two sequences could plausibly be explained by a single reversal. We'd like to define a distance between sequences that measures the number of biologically plausible "moves" necessary to transform one into another. Basing distances on a set of biologically plausible moves gives genetic combinatorics a character distinct from combinatorics in general.

Biologically-inspired metrics may be difficult to compute; depending on the set of "moves" being modeled, it may be difficult to determine how many moves are required. This is why the book is concerned with the efficiency of algorithms. The book discusses numerous algorithms whose analysis is a matter of current research. Other algorithms in the book have been studied thoroughly but are known to be "hard" in a well-defined sense (i.e. NP-complete).

Combinatorics is inspired by applications to genetics but is not limited to such applications. For example, strings are not limited to the four characters representing the four possible base pairs of DNA. Theorems are stated and proved more generally. One could imagine results from combinatorial genetics being applicable to analogous situations outside of biology. Mathematical genetics has drawn extensively from mathematics and computer science; it is interesting to see the genetics returning the favor by contributing back to these other disciplines.

John D. Cook is a research statistician at M. D. Anderson Cancer Center and blogs at The Endeavour.

Monday, June 22, 2009
Reviewable:
Yes
Include In BLL Rating:
No
G. Fertin, A. Labarre, I. Rusu, É. Tannier, and S. Vialette
Publication Date:
2009
Format:
Hardcover
Category:
Monograph
John D. Cook
07/14/2009

 1 Introduction 1.1 A Minimalist Introduction to Molecular Evolution 1 1.2 Birth of the Combinatorics of Genome Rearrangements 4 1.3 Statement of the Problem 6 1.4 Scope of This Survey 7 1.5 Overview of the Models 7 1.6 Organization of the Book 8 I DUPLICATION-FREE MODELS: PERMUTATIONS 11 2 Genomes as Permutations 13 2.1 The Symmetric Group 13 2.2 The Cycles of a Permutation 14 2.3 Signed Permutations 15 2.4 Distances on Permutation Groups 15 2.5 Circular Permutations 18 2.6 First Measures of Similarity between Permutations 20 3 Distances between Unsigned Permutations 25 3.1 Transposition Distance 25 3.2 Prefix Transposition Distance 36 3.3 Reversal Distance 40 3.4 Prefix Reversal Distance (Pancake-Flipping) 47 3.5 Variants 49 3.6 Relations between Distances on Unsigned Permutations 61 4 Distances between Signed Permutations 63 4.1 Conserved Interval Distance 63 4.2 Signed Reversal Distance 64 4.3 Variants of Sorting by Reversals 69 4.4 Combined Operations 72 4.5 Double Cut-and-Joins 73 5 Rearrangements of Partial Orders 75 5.1 Genomes as Partially Ordered Sets 75 5.2 Partially Ordered Sets 75 5.3 Constructing a Poset 78 5.4 Reversal Distance 79 5.5 Breakpoint Distance 80 6 Graph-Theoretic and Linear Algebra Formulations 83 6.1 Simple Permutations and the Interleaving Graph 83 6.2 The Overlap Graph 84 6.3 The Local Complementation of a Graph 85 6.4 The Matrix Tightness Problem 85 6.5 Extension to Sorting by Transpositions 86 6.6 The Intermediate Case of Directed Local Complementation 87 II MODELS HANDLING DUPLICATIONS: STRINGS 89 7 Generalities 91 7.1 Biological Motivations 91 7.2 Strings and Rearrangements on Strings 92 7.3 Balanced Strings 94 7.4 How to Deal with Multiple Copies? 95 8 Distances between Arbitrary Strings 97 8.1 The Match-and-Prune Model 98 8.2 The Block Edit Model 123 9 Distances between Balanced Strings 133 9.1 Minimum Common String Partition Problems 133 9.2 Reversal Distance 138 9.3 Unsigned Transpositions 147 9.4 Unsigned Block Interchanges 153 9.5 Relations between Distances 157 III MULTICHROMOSOMAL MODELS 159 10 Paths and Cycles 161 10.1 Genomes 161 10.2 Breakpoints 162 10.3 Intervals 163 10.4 Translocation Distance 164 10.5 Double Cut-and-Joins (2-Break Rearrangement) 170 10.6 k-Break Rearrangement 171 10.7 Fusions, Fissions, Translocations, and Reversals 172 10.8 Rearrangements with Partially Ordered Chromosomes 174 11 Cycles of a Permutation 175 11.1 A Model for Multichromosomal Circular Genomes 175 11.2 A Generalization to Signed Genomes 178 12 Set Systems and the Syntenic Distance 181 12.1 Introduction 181 12.2 Structural Properties 182 12.3 Lower Bounds 184 12.4 Diameter 185 12.5 Algorithmic Results 185 12.6 Conjectures and Open Problems 189 IV MULTIGENOMIC MODELS 191 13 Median and Halving Problems 193 13.1 Breakpoint Median 194 13.2 Reversal and DCJ Median 197 13.3 Duplicated Genomes 199 13.4 Other Variants, Generalizations, and Discussion 205 14 Rearrangement Phylogenies 207 14.1 The Large Parsimony Problem 207 14.2 The Large Parsimony Problem with Gene Orders 209 14.3 Heuristics for the Breakpoint/Reversal Phylogeny Problem 211 14.4 Variants 220 V MISCELLANEOUS 221 15 Software 223 15.1 Pairwise Rearrangements 223 15.2 Phylogeny Reconstruction and Medians 226 16 Open Problems 229 16.1 Complexity Issues 229 16.2 Diameter 231 16.3 Tightness of Bounds 232 APPENDICES 233 A Graph Theory 235 A.1 Undirected Graphs 235 A.2 Directed Graphs 240 B Complexity Theory 243 B.1 The Class NP 243 B.2 Some NP-Complete Problems 252 Glossary 257 Bibliography 263 Index
Publish Book:
Modify Date:
Tuesday, July 14, 2009