Dynamic Documents with R and knitr

Yihui Xie

Publisher:

Chapman & Hall/CRC

Publication Date:

2015

Number of Pages:

266

Format:

Paperback

Edition:

2

Series:

The R Series

Price:

69.95

ISBN:

781498716963

Category:

Manual

MAA Review
Table of Contents

[Reviewed by

Peter Rabinovitch

, on

12/29/2015

]

I first learned of reproducible research around 1997 or so, in the WaveLab documentation. The idea is that all your results should be easily reproduced by somebody (including yourself six months down the road) with no ambiguity, missed steps, or various fudges. At the time it seemed like such an obvious thing to do, that I wondered why it wasn’t a universal approach. Trying to apply the ideas with the mediocre tools available at the time showed me why it wasn’t, but I always thought the objective was noble.

Fast forward (what feels like a million technological years) to when I first heard about knitr; I think it was about 2013. knitr is a package for use with R, and integrates beautifully with RStudio. It allows for weaving together text, R code, and the results of R calculations into one document. I eagerly installed it and started to use it productively, with very little pain. I have been an enthusiastic, and I thought relatively knowledgeable, user ever since. I was wrong about the latter.

The book under review, written by the creator of the knitr package, is a gold mine of ideas: things I had no idea knitr could do (integrate with different languages like python), and tricks to get around some of the awkward things I needed to do (moving all the code to an appendix for tech-fearful readers). It also explains all the guts of the system, and is especially informative about how knitr can cache results of time intensive calculations, so that they do not have to be rerun each time you compile the document if the precedents have not changed.

The book is well written, but some details are a little complex, so it is best to read the book in front of the computer so you can try things out. The main problem with this book, like all technology books, is that it is already a little outdated — you can can get more updated documentation on the web. On the other hand, it is hard to find things you don’t know to look for on the web, so having this book is very useful for showing what you can do, that you didn’t even know to look for.

In an industrial context the data you are analyzing is frequently badly messed up, requiring repeatedly obtaining new versions of the data and updating the results. In this scenario, knitr has proved hugely useful, allowing for updating reports with one push of a button. Some of the techniques in the book will make this process even more sophisticated (e.g., change a parameter and get slides as an output rather than a document). I am looking forward to implementing them in my workflow, with this book as my guide.

Peter Rabinovitch is a Senior Performance Engineer at Akamai, and as been doing data science since long before “data science” was a thing. knitr has saved his butt many times.

Introduction

Reproducible Research
Literature
Good and Bad Practices
Barriers

A First Look
Setup
Minimal Examples
Quick Reporting
Extracting R Code

Editors
RStudio
LYX
Emacs/ESS
Other Editors

Document Formats
Input Syntax
Document Formats
Output Renderers
R Scripts

Text Output
Inline Output
Chunk Output
Tables
Automatic Printing
Themes

Graphics
Graphical Devices
Plot Recording
Plot Rearrangement
Plot Size in Output
Extra Output Options
The tikz Device
Figure Environment
Figure Path

Cache
Implementation
Write Cache
When to Update Cache
Side Effects
Chunk Dependencies
Load Cache Manually
Other Options

Cross Reference
Chunk Reference
Code Externalization
Child Documents

Hooks
Chunk Hooks
Examples

Language Engines
Design
Languages and Tools
Persistent Sessions

Tricks and Solutions
Chunk Options
Package Options
Typesetting
Utilities
Debugging
Multilingual Support

Publishing Reports
RStudio
Pandoc
HTML5 Slides
Jekyll
WordPress

R Markdown
Overview
Pandoc’s Markdown Extensions
Output Formats
Interactive Documents with Shiny
Extending R Markdown v2
Changes in R Markdown from v1 to v2

Applications
Homework
Serve Dynamic Documents
Web Site and Blogging
Package Vignettes
Books
Literate Programming for R Packages

Other Tools
Sweave
Other R Packages
Python Packages
More Tools

Appendix: Internals

Bibliography

Index

Tags:

Statistical Software