ggplot2
Robert Schlegel
Now that we have seen the basics of ggplot2
, let’s take a moment to delve further into the beauty of our figures. It may sound vain at first, but the colour palette of a figure is actually very important. This is for two main reasons. The first being that a consistent colour palette looks more professional. But most importantly it is necessary to have a good colour palette because it makes the information in our figures easier to understand. The communication of information to others is central to good science.
RColorBrewer
gives us more controlggplot2
ggpubr
We will need the following four packages for the examples in these slides.
RColorBrewer
Central to the purpose of ggplot2
is the creation of beautiful figures. For this reason there are many built in functions that we may use in order to have precise control over the colours we use, as well as additional packages that extend our options even further. The RColorBrewer
package should have been installed on your computer and activated automatically when we installed and activated the tidyverse
. We will use this package for its lovely colour palettes, which may be found in ‘colorPaletteCheatsheet.pdf’.
This is all well and good. But didn’t we claim that this should give us complete control over our colours? So far it looks like it has just given us a few more palettes to use. And that’s nice, but it’s not ‘infinite choices’. That is where the Internet comes to our rescue. There are many places we may go to for support in this regard. The following links, in descending order, are very useful. And fun!
I find the first link the easiest to use. But the second is better at generating discrete colour palettes. During the following exercise spend several minutes playing with the different websites and decide for yourself which one(s) you like.
ggplot(data = penguins,
aes(x = body_mass_g, y = bill_length_mm)) +
geom_point(aes(colour = as.factor(sex))) +
# How to use custom palette
scale_colour_manual(values = c("#A5A94D", "#9699C4"),
# How to change the legend text
labels = c("female", "male", "other")) +
# How to change the legend title
labs(colour = "Sex")
ggpubr
contains functions to seamless plot statistics on our figurescompare_means()
compares means of two (t-test) or more (ANOVA) groups of valuesstat_compare_means()
is designed to be integrated directly into ggplot2
codecompare_means()
# A tibble: 1 × 8
.y. group1 group2 p p.adj p.format p.signif method
<chr> <chr> <chr> <dbl> <dbl> <chr> <chr> <chr>
1 bill_length_mm female male 1.07e-10 0.00000000011 1.1e-10 **** T-test
# A tibble: 1 × 6
.y. p p.adj p.format p.signif method
<chr> <dbl> <dbl> <chr> <chr> <chr>
1 bill_length_mm 2.69e-91 2.7e-91 <2e-16 **** Anova
stat_*()
vs geom_*()
In the previous slides we have seen how to add dots, lines, etc. to graphs using functions that look like geom_*()
. But we can also add statistical calculations and other non-graphical elements to plots using functions that look like stat_*()
.
stat_compare_means()
stat_compare_means()
As mentioned above, these functions may be used with paired tests, non-parametric tests, and multiple mean tests. These outputs have mates in the ggplot2
sphere and so may be visualised with relative ease. The following is an example of how to do this with a multiple mean (ANOVA) test.
# First create a list of comparisons to feed into our figure
penguins_levels <- levels(penguins$species)
my_comparisons <- list(c(penguins_levels[1], penguins_levels[2]),
c(penguins_levels[2], penguins_levels[3]),
c(penguins_levels[1], penguins_levels[3]))
# Then we stack it all together
ggplot(data = penguins, aes(x = species, y = bill_length_mm)) +
geom_boxplot(aes(fill = species), colour = "grey40", show.legend = F) +
stat_compare_means(method = "anova", colour = "grey50",
label.x = 1.8, label.y = 32) +
# Add pairwise comparisons p-value
stat_compare_means(comparisons = my_comparisons,
label.y = c(62, 64, 66)) +
# Perform t-tests between each group and the overall mean
stat_compare_means(label = "p.signif",
method = "t.test",
ref.group = ".all.") +
# Add horizontal line at base mean
geom_hline(yintercept = mean(penguins$bill_length_mm, na.rm = T),
linetype = 2) +
labs(y = "Bill length (mm)", x = NULL) +
theme_bw()