In a recent project, I was looking to plot data from different variables along the same time axis. The difficulty was, that some of these variables I wanted to have as point plots, while others I wanted as box-plots.

Because I work with the tidyverse, I wanted to produce these plots with ggplot2. Faceting was the obvious first step but it took me quite a while to figure out how to best combine facets with point plots (where I have one value per time point) with and box-plots (where I have multiple values per time point).

The reason why this isn’t trivial is that box plots require groups or factors on the x-axis, while points can be plotted over a continuous range of x-values. If your alarm bells are ringing right now, you are absolutely right: before you try to combine plots with different x-axis properties, you should think long and hard whether this is an accurate representation of the data and if its a good idea to do so! Here, I had multiple values per time point for one variable and I wanted to make the median + variation explicitly clear, while also showing the continuous changes of other variables over the same range of time.

So, I am writing this short tutorial here in hopes that it saves the next person trying to do something similar from spending an entire morning on stackoverflow. ;-)

For this demonstration, I am creating some fake data:

library(tidyverse)
dates <- seq(as.POSIXct("2017-10-01 07:00"), as.POSIXct("2017-10-01 10:30"), by = 180) # 180 seconds == 3 minutes
fake_data <- data.frame(time = dates,
                        var1_1 = runif(length(dates)),
                        var1_2 = runif(length(dates)),
                        var1_3 = runif(length(dates)),
                        var2 = runif(length(dates))) %>%
  sample_frac(size = 0.33)
head(fake_data)
##                   time    var1_1    var1_2     var1_3      var2
## 51 2017-10-01 09:30:00 0.4534363 0.9947001 0.07223936 0.8891859
## 35 2017-10-01 08:42:00 0.4260230 0.5613454 0.77475368 0.5780837
## 3  2017-10-01 07:06:00 0.0871770 0.2824280 0.97726978 0.4705974
## 59 2017-10-01 09:54:00 0.6824320 0.9735636 0.67654248 0.4235517
## 5  2017-10-01 07:12:00 0.7979666 0.5857256 0.03911439 0.6918448
## 52 2017-10-01 09:33:00 0.7537796 0.3054030 0.61354248 0.5045606

Here, variable 1 (var1) has three measurements per time point, while variable 2 (var2) has one.

First, for plotting with ggplot2 we want our data in a tidy long format. I also add another column for faceting that groups the variables from var1 together.

fake_data_long <- fake_data %>%
  gather(x, y, var1_1:var2) %>%
  mutate(facet = ifelse(x %in% c("var1_1", "var1_2", "var1_3"), "var1", x))
head(fake_data_long)
##                  time      x         y facet
## 1 2017-10-01 09:30:00 var1_1 0.4534363  var1
## 2 2017-10-01 08:42:00 var1_1 0.4260230  var1
## 3 2017-10-01 07:06:00 var1_1 0.0871770  var1
## 4 2017-10-01 09:54:00 var1_1 0.6824320  var1
## 5 2017-10-01 07:12:00 var1_1 0.7979666  var1
## 6 2017-10-01 09:33:00 var1_1 0.7537796  var1

Now, we can plot this the following way:

  • facet by variable
  • subset data to facets for point plots and give aesthetics in geom_point()
  • subset data to facets for box plots and give aesthetics in geom_boxplot(). Here we also need to set the group aesthetic; if we don’t specifically give that, we will get a plot with one big box, instead of a box for every time point.
fake_data_long %>%
  ggplot() +
    facet_grid(facet ~ ., scales = "free") +
    geom_point(data = subset(fake_data_long, facet == "var2"), 
               aes(x = time, y = y),
               size = 1) +
    geom_line(data = subset(fake_data_long, facet == "var2"), 
               aes(x = time, y = y)) +
    geom_boxplot(data = subset(fake_data_long, facet == "var1"), 
               aes(x = time, y = y, group = time))

sessionInfo()
## R version 3.4.3 (2017-11-30)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS High Sierra 10.13.4
## 
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] de_DE.UTF-8/de_DE.UTF-8/de_DE.UTF-8/C/de_DE.UTF-8/de_DE.UTF-8
## 
## attached base packages:
## [1] methods   stats     graphics  grDevices utils     datasets  base     
## 
## other attached packages:
##  [1] bindrcpp_0.2       forcats_0.3.0      stringr_1.3.0     
##  [4] dplyr_0.7.4        purrr_0.2.4        readr_1.1.1       
##  [7] tidyr_0.8.0        tibble_1.4.2       ggplot2_2.2.1.9000
## [10] tidyverse_1.2.1   
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_0.2.4  xfun_0.1          reshape2_1.4.3   
##  [4] haven_1.1.1       lattice_0.20-35   colorspace_1.3-2 
##  [7] htmltools_0.3.6   yaml_2.1.17       rlang_0.2.0.9000 
## [10] pillar_1.2.1      withr_2.1.1.9000  foreign_0.8-69   
## [13] glue_1.2.0        modelr_0.1.1      readxl_1.0.0     
## [16] bindr_0.1         plyr_1.8.4        munsell_0.4.3    
## [19] blogdown_0.5      gtable_0.2.0      cellranger_1.1.0 
## [22] rvest_0.3.2       psych_1.7.8       evaluate_0.10.1  
## [25] labeling_0.3      knitr_1.20        parallel_3.4.3   
## [28] broom_0.4.3       Rcpp_0.12.15      backports_1.1.2  
## [31] scales_0.5.0.9000 jsonlite_1.5      mnormt_1.5-5     
## [34] hms_0.4.1         digest_0.6.15     stringi_1.1.6    
## [37] bookdown_0.7      grid_3.4.3        rprojroot_1.3-2  
## [40] cli_1.0.0         tools_3.4.3       magrittr_1.5     
## [43] lazyeval_0.2.1    crayon_1.3.4      pkgconfig_2.0.1  
## [46] xml2_1.2.0        lubridate_1.7.3   assertthat_0.2.0 
## [49] rmarkdown_1.8     httr_1.3.1        rstudioapi_0.7   
## [52] R6_2.2.2          nlme_3.1-131.1    compiler_3.4.3