Chasing R dragons

I’m finally starting to understand what people mean about getting into R. For those of you who don’t know, R is an open-source software that allows people to create their own statistical packages (and more). This is huge for multiple reasons.

Firstly, it’s free! It is very frustrating as a student to learn to use whichever statistical software that your institution decided to pay for, only to go out into the rest of the world (or even across campus) and discover that every organization and department has a different software type and your skills are not transferable. R bypasses that by being available to anyone, anywhere, anytime.

Secondly, R is incredibly powerful. If you have ever tried to use macros in Excel to calculate iterations of formulas across a big data set, you understand the limitations of many software packages. R is clean, simple, and effective for data handling, analysis and display. You can do anything in R, from running general additive models to creating dynamic light shows at an ecology conference in Bamfield….

AllanRLightshowPEEC

Amazing light show by Allan Roberts, R-master and University of Victoria teacher. Photo taken from Sam Starko’s Twitter feed (@SamStarko)

So all this to say that I knew R was cool, important to learn, the way of the future, etc… But I still hadn’t experienced the “R high” that can only come after hours upon hours upon days of hunching over your computer, feeding lines of code into the black screen and waiting for something other than an error message to appear…

Finally, I felt that high. I was working on a figure for the Canadian Society for Ecology and Evolution 2017 meeting, and was fighting multiple battles of trying to speak the correct language for R to understand me, learning about the packages I was attempting to use, and trying to make something pretty to go with the theme of my Powerpoint presentation. After countless hours of what seemed like wasted time, I achieved this:

R Graphics Output

Chinook forklengths over time, with individual fish coloured by their genetic stock identification

This may not look like much to you, but to me it was perfection. There is a particular feeling of euphoria that happens when your brain has begun to melt after hours of unsuccessful coding, and then you suddenly press command-enter and a nice looking plot pops up… It’s like you have been chasing this elusive, colourful dragon through dark corridors of code and thick cobwebs of Stack Overflow help posts and suddenly, without realizing that you had turned a corner, you are standing face to face with this glowing and magnificent creature: the R dragon. Then, with newly found courage, you attempt to shake hands with the dragon, or put a hat on it, or change the colour of its scales, and instantly it vanishes, leaving you alone and silent in your dark hallway of code and sending you trudging back to the cobwebs of internet help pages once again.

This taste though, this glimpse of the R dragon, is what motivates you to continue down the spiraling road to learning code. Now that I have tasted it, I want more! Now that I finally have felt the triumphant glow of success – I made that pretty looking plot! – I finally can understand the drive to do everything in R. Speaking with several grad students at the CSEE conference this week, I can take comfort in knowing that I am not alone in my quest. So to all you brave knights and explorers out there – I raise my carpal tunnel coding fist to you, and wish you many glimpses of the R dragon to come!


For fellow eco-nerds out there, below is the code that I used to generate my fancy figure:

Code for Chinook forklengths over time figure

#load dataset: printout of first few rows of my Chin.fl subset of data

head(Chin.fl)

      Date      Species    Primary.stock    forklength
1     2016-03-29     Chinook            Harrison                   43
2     2016-03-29     Chinook            Harrison                   42
22   2016-03-29     Chinook            Harrison                   42
23   2016-03-29     Chinook            Harrison                   43
24   2016-03-29     Chinook            Harrison                   41
25   2016-03-29     Chinook            Harrison                   45

#initiate plot with Chin.fl dataset
library(ggplot2)
p<- ggplot(Chin.fl)

#create basic scatterplot of forklengths over time, assigning colour of points to genetic stock
p<- p + geom_point(alpha = 0.6, aes(x = Date, y = forklength, colour = Chin.fl$Primary.stock), size = 5, show.legend = TRUE, inherit.aes = TRUE)

## formatting
#open external graphics window for easy plot viewing
quartz(width = 11, height = 8.5)
#rename axis titles
p<- p + ylab(“Fork length (mm)”) + xlab(“Sampling Date”)
#simple black and white theme to start
p <- p + theme_bw()
#remove gridlines
p <- p + theme(panel.grid = element_blank())
#change all the colours to have white text and elements on black background
p <- p + theme(axis.title = element_text(colour = “white”), plot.background = element_rect(fill = “black”), panel.background = element_rect(fill = “black”), axis.text = element_text(colour = “white”), axis.line = element_line(colour = “white”), axis.ticks = element_line(colour = “white”), legend.title = element_text(colour = “white”, size = 18), legend.text = element_text(colour = “white”), legend.background = element_rect(fill = “black”), legend.key = element_rect(fill = “black”))
#make all labels larger so more visible on export, push axis titles out with vjust (note opposite directionality with y vs x axis)
p <- p + theme(axis.title.y = element_text(size = 24, vjust = 1), axis.title.x = element_text(size = 24, vjust = 0), axis.text = element_text(size = 14), legend.text = element_text(size = 14))

#load scales package to edit x-axis display of dates
library(scales)
#take a quick look at sampling date range in data set
un.date<- unique(Chin.fl$Date)
#create vector of selected sampling dates for custom x-axis labels
date<- (un.date[c(1, 4, 5, 10, 18, 20, 26, 32, 35, 38)])
#tell ggplot to make the breaks on the x-axis = to the values in vector ‘date’, and change display to month-day
p <- p + scale_x_date(breaks = date, labels = date_format(“%b-%d”))
#angle date labels to fit more, use hjust to increase horizontal space between axis labels and ticks, and vjust to do same vertically. Increase margins around plot to 5mm
p <- p + theme(axis.text.x = element_text(angle = 20, hjust = 0.8, vjust = 0.9), plot.margin = unit(c(5,5,5,5), “mm”))

#change colour scale to have greater contrast and highlight Harrison stock – see http://colorbrewer2.org/#type=sequential&scheme=BuGn&n=3 for more info
p <- p + scale_color_brewer(type = “qual”, palette = “Paired”, na.value = “grey80”)
#change the label of the colour legend
p$labels$colour <- “Stock Identification”
##side note on colour palettes – color brewer palette limits out at 11 distinct colours, but I have 13 variables – I am OK with this as the key stocks I was interested in stand out and contrast is nice.

#view your plot!
print(p)

Resources used to make this plot:

R Core Team (2015). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.

H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New  York, 2009.

Hadley Wickham (2016). scales: Scale Functions for Visualization. R package version 0.4.1. http://CRAN.R-project.org/package=scales

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s