It which shall not be named

Robert Schlegel

But really, why R?

We could extol the virtues of R all day, but let’s rather see for ourselves why R is one of the preferred tools for science.

In this exercise we will be opening a dataset in MS excel and giving ourselves 30 minutes to perform a basic analysis and visualisation.

Gasp! Yes I know. After all of that and now we are using MS Excel? But trust us, there is method to this madness.

Dataset

The dataset we are using comes from the NOAA OISST remotely sensed seawater temperature product. It contains time series for pixels from three different regions, each 39 years long.

Analysis

Monthly climatology: the average temperature for a given month at a given place.

So in this instance, because we have three time series, we will want 39 total values comprised of January - December monthly means for each site.

E.g. If a time series is 39 years long, then a climatological December will be the mean temperature of all of the 39 Decembers for which data are available.

Visualisation

Dot and line plot: A dot for each monthly climatology, connected by a line.

Once the monthly climatologies per site have been calculated, it should be a relatively easy to visualise them as a dot and line plot.

NB: We want to create one dot and line plot for each site

Mission

Your mission, should you choose to accept it:

  • Open the file sst_NOAA.csv in MS Excel
  • Create monthly climatologies for each site
  • Create a dot and line plot for the climatologies

You will have 30 minutes starting now…

30:00

Moving forwaRd

Using excel may allow one to make small changes quickly, but rapidly becomes laborious when any sophistication is required.

Now that you’ve had some time to look at the data and work through the exercise in Excel, let’s see how to do it in R.

The few lines of code below make all of the calculations we need in order to produce the results we are after.

Monthly clims - Prep

The first step in an R workflow (more on this later) should be to load your libraries and then your data.

# Load libraries
library(tidyverse)
library(lubridate)

# Load data
sst_NOAA <- read_csv("../data/sst_NOAA.csv")

Monthly clims - Calculate

After which we perform our anlayses.

sst_monthly <- sst_NOAA %>% 
  mutate(month = month(t, label = T)) %>% 
  group_by(site, month) %>% 
  summarise(temp = round(mean(temp, na.rm = T), 3))

Monthly clims - Visualise

And finish up with visualisations.

ggplot(data = sst_monthly, aes(x = month, y = temp)) +
  geom_point(aes(colour = site)) +
  geom_line(aes(colour = site, group = site)) +
  labs(x = NULL, y = "Temperature (°C)")

Monthly clims - Visualise

Which may be iterated on (more on this later).

ggplot(data = sst_monthly, aes(x = month, y = temp)) +
  geom_point(aes(colour = site)) +
  geom_line(aes(colour = site, group = site)) +
  labs(x = "", y = "Temperature (°C)") +
  facet_wrap(~site, ncol = 1) # Create panels