Data Visualisation in R

Visualisation

Making you data come to life!

Objectives

  • How data can be visualised in R
  • Learn how to adjust graphics
  • Get familiar with ggplot2 and it’s grammar of graphics to create amazing figures

Data Visualisation - An Overview

  • The overall aim of visualising data:
    • Make all your plots as self explanatory as possible!
  • For this lecture, we will focus on ggplot2, a tidyverse package
    • Inspired on the Grammer of Graphics, a book that aims to formalise visualisations into layers
  • Has multiple add-ons, such as ggrepel (for text labels) and ggpubr (for publication-ready plots) or patchwork (for combining multiple plots)

Base R plots

  • Graphs can be easily generated with the base R syntax
data <- c(2,3,6,4,9)
plot(data)

plot(data, type = "l")

Base R plots

  • Graphs can be easily generated with the base R syntax
plot(data, type = "o", col = "blue")

barplot(data)

ggplot2

  • ggplot2 is a lot more useful and user friendly than base R, making plots look a lot nicer and with more options for building and displaying graphics
  • We’ll start with an old favourite, the mtcars dataset!
# Load packages
# ggplot2 is in the tidyverse
library(tidyverse) 
library(conflicted)
library(ggrepel)
library(ggstatsplot)
library(plotly)
conflicts_prefer(dplyr::filter)

mtcars |>
  ggplot(aes(x = mpg, y = hp)) +
  geom_point()

ggplot2 - layers

  • ggplot() is the main function, and this creates the initial ggplot object where we then add multiple layers
  • A layer is a collection of geometric elements (geoms) and statistical transformations
  • An easy example of a “geom” element layer is geom_point(), which adds a scatter plot
  • Aesthetic mappings (aes) are specified with aes(). This is how variables (columns) in the input data are mapped to visual, or “aesthetic”, properties.
  • You can give global “aesthetics” to a plot (will appear in every layer) by specifying this in the ggplot() function, or local aesthetics in individual layers (such as in geom_point()).

Themes

  • A “theme” controls the finer points of the plot, like the font size and background colour
  • This is essentially customising the non-data elements
  • For example, change the default grey background to white background1
mtcars |>
  ggplot(aes(x = mpg, y = hp)) +
  geom_point() +
  theme_bw()

Themes - Global

  • These themes can be set globally, so for all plots, using the theme_set() function
theme_set(theme_bw())

mtcars |>
  ggplot(aes(x = mpg, y = hp)) +
  geom_point()

  • You can see some more themes provided by ggplot2 here

Quick plots with qplot

  • A quicker version of ggplot! Good for very basic figures
qplot(mpg, hp, data = mtcars)

Aesthetics: Colour

  • We can visualise more information by colouring the data points by another variable
  • For example, in mtcars we can map the number of cylinders to the colour aesthetic (or color if you want to spell it wrong…)
ggplot(data = mtcars,
       mapping = aes(x = mpg, y = hp, 
                     colour = factor(cyl))) +
  geom_point()

Aesthetics: Size and Shape

  • We can also map the number of cylinders to the size or shape aesthetic
# Using size
ggplot(data = mtcars,
       mapping = aes(x = mpg, y = hp, 
                     size = cyl)) +
  geom_point()

# Using shape
ggplot(data = mtcars, 
       mapping = aes(x = mpg, y = hp, 
                     shape = factor(cyl))) +
  geom_point()

Aesthetics: Shape & Colour

  • We can combine aesthetics as we like too!
ggplot(data = mtcars, 
       mapping = aes(x = mpg, y = hp, 
                     shape = factor(cyl),
                     colour = factor(cyl))) +
  geom_point()

Aesthetics: Conditional Colour

  • We can even map an aesthetic to a datapoint based on a condition, i.e. only change colour when a certain condition is met.
  • For example here, the colour varies depending on whether the car has 4 cylinders or not (cyl == 4 being TRUE or FALSE)
ggplot(data = mtcars,
       mapping = aes(x = mpg, y = hp,
                     colour = cyl == 4)) +
    geom_point()

Aesthetics: Fill

  • Fill is yet another aesthetic
ggplot(data = diamonds) +
    geom_bar(mapping = aes(x = cut,
                           fill = cut))

Facets

  • A “facet” is one section of something that has many sections.
  • A “facet” in ggplot allows you to break up the data into different subsets and plot individual panels based on it
  • Creates “subplots” or panel-like figures
  • Really useful when you’ve got categorical variables (such as gender)
ggplot(data = mtcars,
       mapping = aes(x = mpg, y = hp,
                     colour = factor(cyl))) +
  geom_point() +
  facet_wrap(~ cyl)

Facets: layout

  • We can change the layout of the “facets” locally:
ggplot(data = mtcars,
       mapping = aes(x = mpg, y = hp,
                     colour = factor(cyl))) +
  geom_point() +
  facet_wrap(~ cyl, ncol = 1)
  # Note the following is equivalent
  # facet_wrap(~ cyl, dir = "v")

Facet grids

  • We can combine 2 variables with facet_grid()
ggplot(data = mtcars,
       mapping = aes(x = mpg, y = hp,
                     colour = factor(cyl))) +
  geom_point() +
  facet_grid(am ~ cyl)

Additional Geoms

  • So far we’ve only used geom_point(), but there are naturally many more geoms we can use
  • geom_smooth() draws a smoothed line based on the trend of the provided data
  • They can be used individually, or layered on top of one another, which is the core of the grammer of graphics
ggplot(data = mtcars, 
       mapping = aes(x = mpg, y = hp)) +
  geom_smooth()

Combining geoms

  • Here we layer two geoms
ggplot(data = mtcars, 
       mapping = aes(x = mpg, y = hp)) +
  geom_smooth() +
  geom_point()

  • Note that the order of geoms can matter! (though in this case it doesn’t :P)

Layering geoms and additional aesthetics

  • The order in which we give aesthetics can also matter
ggplot(data = mtcars, 
    mapping = aes(x = mpg,
                  y = hp,
                  colour = factor(cyl))) +
    geom_smooth() +
    geom_point()

# A more sensible order?
ggplot(data = mtcars, 
    mapping = aes(x = mpg,
                  y = hp)) +
    geom_smooth() +
    geom_point(aes(colour = factor(cyl)))

Statistical Transformations: count

  • Some plots transform your data internally and plot those new values instead of raw values
  • The stat argument of different plot types (geom functions) specifies the statistical transformation
  • For example, geom_bar() uses stat = "count" as it’s default to create counts of the mapped variable (as in what a bar chart does):
ggplot(data = diamonds) +
  geom_bar(mapping = aes(x = cut))

# Worth noting: geom and stats are often interchangeable
ggplot(data = diamonds) +
  stat_count(mapping = aes(x = cut))

Statistical Transformations: identity

  • Another commonly used stat is “identity” when plotting bars with heights based on raw values
## example tibble
demo <- tribble(
  ~cut,            ~value, 
  "Fair",          1610, 
  "Good",          4906, 
  "Very Good",     12082, 
  "Premium",       13791,
  "Ideal",         21551 )

ggplot(data = demo) +
  geom_bar(mapping = aes(x = cut, 
                         y = value), 
           stat = "identity")

Break

Positional Adjustments

  • The position argument can control how geoms occupy space
# stacked 
ggplot(data = diamonds) +
    geom_bar(mapping = aes(x = cut, 
                           fill = clarity))
  
# side by side 
ggplot(data = diamonds) +
    geom_bar(mapping = aes(x = cut, 
                           fill = clarity),
             position = "dodge") 
# Note the default is position = "stack"

Labels: Titles and Captions

  • Another editable part of a plot are the text labels, which we can add or modify using labs()
ggplot(data = mtcars, 
       mapping = aes(x = mpg, y = hp)) +
  geom_point(aes(colour = factor(cyl))) +
  labs(title = "Title",
       subtitle = "Subtitle",
       caption = "Small Caption"
        )

Labels: Axis and legends

ggplot(data = mtcars, 
       mapping = aes(x = mpg, y = hp)) +
  geom_point(aes(colour = factor(cyl))) +
  labs(x = "Miles per gallon (mpg)", 
  y = "Horsepower (hp)", 
  colour = "Cylinders"
         )

Labels: Axis alternative

ggplot(data = mtcars, 
       mapping = aes(x = mpg, y = hp)) +
  geom_point(aes(colour = factor(cyl))) +
  xlab("Miles per gallon") +
  ylab("Horsepower")

Annotations

  • To add annotation to data points, we can use geom_text() and geom_label()
ggplot(data = mtcars, 
       mapping = aes(x = mpg, y = hp)) +
  geom_point(aes(color = factor(cyl))) +
  geom_text(aes(label = rownames(mtcars)))

ggplot(data = mtcars, 
       mapping = aes(x = mpg, y = hp)) +
  geom_point(aes(color = factor(cyl))) +
  geom_label(aes(label = rownames(mtcars)), 
             alpha = 0)

Annotations: ggrepl

  • The ggrepl package can help make more legible labels
ggplot(data = mtcars, 
       mapping = aes(x = mpg, y = hp)) +
  geom_point(aes(color = factor(cyl))) +
  ggrepel::geom_text_repel(aes(
    label = rownames(mtcars)), 
    max.overlaps = 100)
# Label only a subset
ggplot(data = mtcars,
       mapping = aes(x = mpg, y = hp)) +
  geom_point(aes(color = factor(cyl))) +
  ggrepel::geom_text_repel(aes(
    label = rownames(mtcars[1:3, ])),
    data = mtcars[1:3, ])

Zooming

  • To control the plot limits, you have 3 methods:
    • Adjusting the data that’s plotted
    • Setting xlim and ylim in coord_cartesian() (do this!!)
    • Setting the limits in each scale
# Not ideal - need strong justification!
mtcars |>
  filter(mpg >= 20, hp <= 150) |>
  ggplot(mapping = aes(x = mpg, y = hp)) +
  geom_point(aes(color = factor(cyl))) +
  geom_smooth()

Zooming - coord_cartesian

ggplot(data = mtcars, mapping = 
         aes(x = mpg, y = hp)) +
  geom_point(aes(color = factor(cyl))) +
  geom_smooth() +
  coord_cartesian(xlim = c(20, 35), 
                  ylim = c(0, 150))

This is the RIGHT WAY, using coord_cartesian()

Zooming - lims

ggplot(data = mtcars, mapping = 
         aes(x = mpg, y = hp)) +
  geom_point(aes(color = factor(cyl))) +
  geom_smooth() +
  lims(x = c(20, 40),
       y = c(0, 150))

This is also not ideal, as it removes data outside the limits!

Scales

  • “scale” allows you control mapping things like colour, size and shape to data values
  • “scale” draws a legend or axes
  • ggplot2 automatically adds default scales behind the scenes
ggplot(mtcars, aes(x = mpg, y = hp)) +
  geom_point(aes(color = factor(cyl)))

Scales - defaults

  • Is the same as:
ggplot(mtcars, aes(x = mpg, y = hp)) +
  geom_point(aes(color = factor(cyl))) +
  scale_x_continuous() +
  scale_y_continuous() +
  scale_colour_discrete()

Scales - axis breaks

  • The naming scheme tells you the aesthetic (x_, y_, colour_, etc) and the name of the scale (continuous, discrete)
# Change the breaks on the axis
ggplot(mtcars, aes(x = mpg, y = hp)) +
  geom_point(aes(color = factor(cyl))) +
  scale_y_continuous(
    breaks = seq(0, 350, by = 50)
    )

Scales - axis labels

# Adding text to labels
ggplot(mtcars, aes(x = mpg, y = hp)) +
  geom_point(aes(color = factor(cyl))) +
  scale_y_continuous(
    breaks = seq(0, 350, by = 50),
    labels = paste0(
      "HP ", seq(0, 350, by = 50)
      )
    )

# No labels
ggplot(mtcars, aes(x = mpg, y = hp)) +
  geom_point(aes(color = factor(cyl))) +
  scale_y_continuous(
    breaks = seq(0, 350, by = 50), 
    labels = NULL)

Legend Layout - position

  • The legend can of course also be modified in lots of ways
p <- ggplot(mtcars, aes(x = mpg, y = hp)) +
  geom_point(aes(color = factor(cyl)))

# Legend at the top
p + theme(legend.position = "top")
# No legend
p + theme(legend.position = "none")
# Note - the default is "right"

Legend Layout - guides

  • We can use the guides() function to control the legend display
ggplot(mtcars, aes(x = mpg, y = hp)) +
  geom_point(aes(colour = factor(cyl)), 
             alpha = 0.5) +
  guides(colour = guide_legend(
    ncol = 2, 
    override.aes = list(size = 3, 
                        alpha = 1))
    )

Controlling the Colour Scale - alt palettes

  • The default ggplot2 colours we get are a bit rubbish
  • Many pre-defined colour palettes are available to change that
  • Such as from the RColorBrewer package
  • Palette explainer here and interactive browser here
ggplot(mtcars, aes(x = mpg, y = hp)) +
  geom_point(aes(color = factor(cyl))) +
  scale_colour_brewer(palette = "Set1")

Controlling the Colour Scale - manually

ggplot(mtcars, aes(x = mpg, y = hp)) +
  geom_point(aes(color = factor(cyl))) +
  scale_colour_manual(values =
                        c("black", 
                          "pink", 
                          "turquoise"))

# Explicitly setting the values to colours
ggplot(mtcars, aes(x = mpg, y = hp)) +
  geom_point(aes(color = factor(cyl))) +
  scale_colour_manual(values =
                        c(
                          `4` = "black",
                          `6` = "pink",
                          `8` = "red"
                        ))

Saving plot - the ggplot2 way

  • We of course want to be able to save the beautiful plots we make! We can do this using ggsave()
  • If not specified, it will save the most recent plot we create to our disk
  • The format of the plot is defined in the filename extension (.pdf or .png for example)
ggplot(mtcars, aes(x = mpg, y = hp)) +
  geom_point(aes(color = factor(cyl)))
# Save the last printed plot
ggsave(filename = "my_plot_1.pdf")
# Save the plot to a variable first
plot <- ggplot(mtcars, aes(x = mpg, y = hp)) +
  geom_point(aes(color = factor(cyl)))
ggsave(filename = "my_plot_1.png", plot)

Saving plot - the base R way

  • With base R we need to set the device we want to save with first, then print the plot, and then close the device
# Save to png
png(filename = "my_plot.png", width = 500, height = 400)
# Print the plot
ggplot(mtcars, aes(x = mpg, y = hp)) +
  geom_point(aes(color = factor(cyl)))
# Close the device
dev.off()

Compound plots

  • What if you want to put two or more plots together to save?
plot1 <- ggplot(mtcars, 
                aes(x = mpg, y = hp)) +
  geom_point(aes(color = factor(cyl)))
plot2 <- ggplot(mtcars, 
                aes(x = qsec, y = hp)) +
  geom_point(aes(color = factor(cyl)))
plot1 + plot2

Boxplots and Violin Plots

  • Boxplots and Violin Plots are very common within biolosciences(protein levels, patient data, SNP frequency etc.)
  • Be warned that boxplots can sometimes be misleading and so it’s always good to check the raw data too! See here for more info
  • Have a go at creating your own boxplots and violin plots using the mtcars/diamonds/other datasets!
mtcars$cyl <- as.factor(mtcars$cyl)
# Make base plot
p <- ggplot(mtcars, aes(cyl, mpg))

p + geom_boxplot(aes(colour = cyl))

Boxplots and Violin Plots - examples

  • geom_jitter is like geom_point, but adds noise so point aren’t on top of each other, handy for mapping the raw data onto other geoms!
p + geom_boxplot(aes(fill = cyl), 
                 alpha = 0.3) +
  geom_jitter(size = 0.8)
# Violin
p + geom_violin(aes(fill = cyl))

Boxplots and Violin Plots - combining geoms

  • We can of course layer them on top too!
# All three!
p + geom_violin(aes(fill = cyl), 
                width = 1.4) +
  geom_boxplot(width = 0.1, 
               colour = "grey", 
               alpha = 0.3) +
  geom_jitter(size = 0.8, 
              width = 0.1, 
              colour = "grey") +
  scale_fill_viridis_d() +
  theme_minimal()

Boxplots and Violin Plots - numeric to categorical

# discretise numeric data into categorical
ggplot(diamonds, aes(x = carat, y = price)) +
  geom_boxplot(aes(
    group = cut_width(carat, 0.2)
    ))
ggplot(diamonds, aes(x = carat, y = price)) +
  geom_boxplot(aes(
    group = cut_number(carat, 20)
    ))

3D Plots

  • plotly can be used to make 3D plots, but this is very situational and should probably only be done in cases where the plot is intended to be interactive
mtcars$gear <- as.factor(mtcars$gear)

plot_ly(
  mtcars,
  x = ~ wt,
  y = ~ hp,
  z = ~ qsec,
  color = ~ gear
) |>
  add_markers() |>
  plotly::layout(scene = list(
    xaxis = list(title = 'Weight'),
    yaxis = list(title = 'Gross horsepower'),
    zaxis = list(title = '1/4 mile time')
  ))

3D Plots

ggstatsplot

  • A handy way to quickly look at correlations!
ggstatsplot::ggscatterstats(mtcars, x = hp, y = qsec)

That’s all folks!

  • These slides can be found on the website here: