Supporting materials


GDAdata package

The GDAdata package can be downloaded from CRAN. It supplies seven datasets, not otherwise available in R.


Updated/corrected code

Figure 3.10: Use data(stamp, package=”bootstrap”) and Thickness instead of thickness for the variable name. The code is now:

data(stamp, package="bootstrap")
par(las=1, mar=c(3.1, 4.1, 1.1, 2.1))
with(stamp, {
     hist(Thickness,breaks=seq(0.055,0.135,0.001), freq=FALSE, main="", col="bisque2", ylab="")
     lines(density(Thickness), lwd=2)

Figure 3.11 needs to load the movies dataset:

data(movies, package="ggplot2movies")

Figure 5.7 needs to load the movies dataset:

data(movies, package="ggplot2movies")

Figures 6.1 – 6.4 and 6.13 use the food dataset, as does the code example on page 119. Now use data(foodnames, package=”GDAdata”) and make other necessary amendments.

Figure 6.1

data(foodnames, package="GDAdata")

Figure 6.2

Figure 6.13 (Two lines must be changed)
data(foodnames, package="GDAdata")

The code for Figure 6.11 on p114 has been changed. There is now no need to sort in decreasing order and the colour definition has been switched:

uniranks2 <- within(uniranks1,
          Rus <- ifelse(UniGroup=="Russell", "Russell", "not"))
           columns=c(5:8, 10:14), 
           groupColumn="Rus", scale="uniminmax") +
           xlab("") + ylab("") +
           theme(legend.position = "none",
           axis.ticks.y = element_blank(),
           axis.text.y = element_blank()) +
           scale_colour_manual(values = c("grey", "red"))

The second line of the code for Figure 6.17 should now be

B2 <- acast(B1$data[ , c(2,4,5)], .ID ~ variable)

The mi package has been changed and the missing.pattern.plot function no longer exists. The new alternative in mi is shown in a new Figure 9.1 using the following code:

data(CHAIN, package="mi")
par(mar=c(1.1, 4.1, 1.1, 2.1))
detach("package:mi", unload=TRUE)

geom_density2d in ggplot2 is now called geom_density_2d and the parameter bins is no longer available. The code for Figure 9.9 has been changed accordingly:

data(olives, package="extracat")
ggplot(data=olives, aes(x=oleic, y=palmitic)) + geom_point() +
       geom_density_2d(col="red") + geom_smooth()


Updates and errata

(MMST) The package MMST to accompany Izenmann's book is no longer available and so the datasets used in the book have to be found elsewhere. This affects the following places in the book:

40 - 41
Hidalgo1872: The dataset stamp in the bootstrap package provides the same data.

100 - 104, 118f, 128
The foodnames dataset in the GDAdata package provides the same data and two additional descriptive variables. References to the food dataset in the text and in figure captions need to be changed to foodnames, as does the index entry on p295.

Exercise 5 Bodyfat: the dataset bodyfat in the mfp package provides the same data. (NB the version in the SIN package provides dataset statistics and a link to where the dataset can be found on the web, but not the dataset itself.)

Exercise 7 Wine: MMST is no longer available, the other sources still are.

Exercise 5 Pima Indians: MMST is no longer available, the other sources still are.

The package ggplot2 was updated to version 2.0 in December 2015. geom_bar() can no longer be used as an alternative to geom_histogram() for continuous data. This affects the code on pages 28, 33, 37, 42, 202, 203, 248, 267.

The dataset movies is no longer part of ggplot2, but has its own package ggplot2movies, so it has to be specifically loaded for the examples on pages 42 and 82 and for Exercise 4 of Chapter 3 on p50.

Some other packages make use of ggplot2 and have been updated accordingly, for instance GGally used in Chapters 6, 9, and 12, and coefplot used in Chapter 10.
The ggparcoord function in GGally can no longer use the size parameter the way it was set in the book and Figures 6.6, 6.7, 12.1 have been changed. The option mapping=aes(size=!) within the ggparcoord function has been dropped and the layer geom_line(size=1) added.
It is probably best to always use the latest versions of packages on CRAN.

Further corrections:
80-81 (Thanks to Luke Tierney for spotting this.)
The geyser dataset reports the previous waiting time not the next waiting time (which the faithful dataset used in Chapter 1 reports). Correct versions of Figures 5.4 and 5.5 consistent with the Wikipedia page are obtained with

data(geyser, package="MASS")
ggplot(geyser, aes(lag(duration), waiting)) + geom_point()


ggplot(geyser, aes(lag(duration), waiting)) + geom_point() + geom_density2d()

Figure 7.12 The complete caption should be:
"rmb plots of the housing dataset. In the upper plot each barchart has the same vertical scale, but its own horizontal scale, and its total area reflects the size of the group in that cell. In the lower plot each barchart has the same vertical and horizontal scales and the intensity of colouring reflects the cell group size."

"the gridArrange function" should be "the grid.arrange function"

Errors pointed out by Nick Cox (for which many thanks)
Gaskins not Gaskin
Exercise 1 The dataset galaxies contains data on galaxies (not planets).
setosa is species not a variety.
96 and 195
Pearson's not Pearsons'
The trimming described here is really Winsorizing.
158 and 218
distinct not unique
Rothamsted not Rothamstead
It might be better to say Wilcoxon tests compare two samples rather than two means.
criteria ... are
Poisson not poisson