Program outline

February 28, 2020:
  • 12.30-13.00: Registration
  • 13.00-17.00: Workshops
  • 17.00-18.30: After-work drinks
February 29, 2020:
Time Activity
9:00 Registration
9:30 Welcome
9:35 Peter Dalgaard: A Brief History of R and some Thoughts about Its Future
10:20 Birthday celebration
10:25 Roger Bivand: How R Helped Provide Tools for Spatial Data Analysis
11:10 Break
11:35 Mark Edmondson: Evolving R with Docker and the Cloud
12:20 Thomas Lin Pedersen: [Generative Art in R]
12:30 Therese Graversen: R Provides Statistical Evidence in Criminal Cases
13:15 Lunch
14:00 Julia Silge: Text Mining with Tidy Data Principles
14:45 Birthday celebration
14:55 Hannah Frick: Assessing model parameter stability in R
15:40 Break
16:10 Winston Chang: When, Why, and How to Do Object-Oriented Programming with R6
16:55 Birthday celebration
17:05 Heather Turner: Moving Forwards: Greater Equity and Inclusion in the R Community
17:50 Closing
18:00 Reception
19:30 Goodbye

Speakers

Peter Dalgaard

Copenhagen Business School, Denmark

Peter is a member of the R core group and has been release manager since before the release of version 1.0.0. He is currently working towards the release of version 4.0.0.

Julia Silge

RStudio, USA

Julia is the author of the tidytext R package and has been a strong advocate for tidy structured text mining approaches in R.

Roger Bivand

Norwegian School of Economics, Norway

Roger has been active in the R community since 1997 and is the author of numerous R packages, particularly in the field the spatial data analysis. Roger is a member of the R foundation.

Heather Turner

Freelance, UK

Heather is the chair of R Forwards - an R Foundation task force for underrepresented groups in the R community. Heather is a member of the R foundation.

Therese Graversen

IT University of Copenhagen, Denmark

Therese has developed the DNAmixtures R package for analyzing DNA samples. She has used her research and the R package to help forensic investigations by serving as an expert witness for the British police several times.

Mark Edmondson

IIH Nordic, Denmark

Mark works as a data engineer at IIH Nordic and is a certified Google Developer Expert for Google Analytics and Google Cloud. He has developed several CRAN packages calling Google APIs such as Google Analytics, Compute Engine and Cloud Storage, some of which he contributes to rOpenSci and the cloudyR organisations.

Hannah Frick

Mango Solutions, UK

Hannah works as a senior data scientist at Mango Solutions where she helps organisations to become more data-driven through data science projects (with a focus on analytical projects) and training in R and Python. She is a member of the global leadership team of R-Ladies and a co-organiser of the London chapter.

Winston Chang

RStudio, USA

Winston is a software engineer at RStudio, and the creator of the R6 package. Moreover, he is a developer for the Shiny, ggplot2, and devtools packages. He holds a Ph.D. in psychology from Northwestern University and is the author of R Graphics Cookbook.

Workshop hosts

Dirk Eddelbuettel

University of Illinois, USA

Dirk will host a workshop on the Rcpp library, which is a highly used API for integrating C++ code into R.

Thomas Lin Pedersen

RStudio, Denmark

Thomas will host a workshop on the ggplot family of packages can be used for producing diverse and customizable graphics in R.

Abstracts

A Brief History of R and some Thoughts about Its Future

Peter Dalgaard - Slides

I will outline the historical context that enabled the creation of R: The PC, Internet, and OpenSource revolution starting around 1990, and the statistical computing environments that were available for young scientists at the time. I will then describe the concrete events leading up to the release of version 1.0.0. I will put some emphasis on the efforts to create a reliable and stable platform for statistical software development. I will also discuss later developments inside and outside the R Core Team and present my views on future challenges.


Text Mining with Tidy Data Principles

Julia Silge - Slides

Today, R users find that text data is increasingly important in many domains in which they work. Tidy data principles and tidy tools can make text mining easier and more effective; in this talk, hear how to manipulate, summarize, and visualize characteristics of text using these methods and R packages from the tidy tool ecosystem. These tools are highly effective for many analytical questions and allow today’s R users to integrate natural language processing into effective workflows already in wide use. Explore how to implement approaches from exploratory data analysis to unsupervised and supervised machine learning.


How R Helped Provide Tools for Spatial Data Analysis

Roger Bivand - Slides

Twenty years ago, most labs and universities had little access to tools for spatial data analysis. The access they had was mostly closed source and costly. We do not know specific adoption rates of open source software for spatial data analysis, nor the shares of R packages. However, the proliferation of reverse depencences on core packages in the Spatial Task View and pagerank analyses do suggest that needs have been met. This gives strong incentives both to maintain backwards compatibility, and to adapt to emerging data sources and methods of analysis.


Moving Forwards: Greater Equity and Inclusion in the R Community

Heather Turner - Slides

In December 2015, the R Foundation created a taskforce to address the underrepresentation of women among contributors to the R project. Just over a year later, the scope was broadened to include all underrepresented groups and the taskforce took on the name Forwards. The work of Forwards has contributed to the widely-recognised inclusiveness of the R community that has gone from strength to strength in recent years.

However, there is still much that can be done to widen the participation of underrepresented groups, particularly in the technical side of the R project (developer meetings, package development, R core development), but also in the general user community for specific groups (people living in underserved regions or people with disabilities that affect access). This talk will consider these challenges and discuss some ways we might move forwards to achieve greater equity and inclusion in future.


R Provides Statistical Evidence in Criminal Cases

Therese Graversen

R has proven to be a flexible and powerful platform for developing specialised statistical software, allowing me to single-handedly implement our academic research into a software solution now used in several court cases. I will discuss some criminal cases where I have acted as an expert witness. In particular, I will highlight the role that R plays in criminal investigations pertaining to complex DNA evidence, and the various ways in which R has been pivotal in changing the way that DNA evidence is evaluated.


Evolving R with Docker and the Cloud

Mark Edmondson - Slides

Drawing on the pivotal work of the rocker-project.org, Docker opens up capabilities for R to scale vertically and horizontally within the Cloud. Using Google Cloud Platform as an example, Mark charts his own journey through cloud based R applications, underpinned by embracing Docker. Starting with “lift-and-shift” of replicating local R environments in massive virtual machines, to adapting R workflows to leverage massive parallel processing on computing clusters, and now culminating with his newest ‘serverless R’ package, googleCloudRunner that offers cloud capabilities to anyone with a few lines of R. Learn how to create R APIs that scale from 0 to millions, throw asynchronous R builds up to the cloud and perform easy script scheduling.


Assessing model parameter stability in R

Hannah Frick

When we fit a model, we often make the assumption that the same set of model parameters holds for the entire sample. However, different parameters may hold in subgroups (or clusters) in the data and these clusters may or may not be explained by additional covariates. Finite mixture models are a common technique for detecting such clusters and additional covariates (if available) can be included as concomitant variables. Another approach that relies on covariates for detecting the clusters are model-based trees. This talk explores these approaches in the context of Rasch models for binary questionnaire data with the psychomix and psychotree packages.


When, Why, and How to Do Object-Oriented Programming with R6

Winston Chang

R’s traditional object systems, S3 and S4, are well-suited for most data analysis tasks. But as the R ecosystem grows and evolves, we frequently encounter other kinds of problems that are better served by a “classical” object-oriented system, similar to those used in C++ and Java. The R6 package provides this sort of system. I will discuss when and why it makes sense to use R6, and illustrate with real-world examples. I will also explain how R6 came to be, as well as its design and implementation. The R language is being used more and more for managing external compute and data resources, and this style of object-oriented programming is especially well-suited for these tasks.