Upcoming seminars

Friday, July 19, 15:15

Yang Zhao
Department of Biostatistics, School of Public Health, Nanjing Medical University, P.R.China
Mediation Analysis and Random Forests

In this presentation, we will introduce the possibility and practice of using random forests, an ensembled machine learning method, in causal mediation analysis. We will also discuss the advantages and potential risks of using RF-based methods in causal inference.

We would firstly describe the limitations of the traditional regression-based mediation analysis. We then briefly describe the basic procedure of random forests. We proposed a residual based method to remove confounding effects in RF analysis and introduce its applications in high dimensional genetic analysis[1]. The proposed RF-based mediation analysis framework includes three steps. First, we build a causal forest model under the counterfactual framework to model the relationship between outcome, treatment, mediators and covariates[2]. Next, we predict the mediators using traditional random forests using predictors including treatment and covariates. The average effects are then estimated using weighted methods. Possible candidates for the weights include the inverses of probabilities and variances. We performed extensive computer simulations to evaluate the performance of random forests in mediation analysis. We observed that the proposed methods can obtain accurate estimates on the direct and in-direct effects. Meanwhile, The results demonstrated that RF-based methods is more flexible than traditional regression based methods. As the RF-based method can handle non-linear relationship and high order interactions, we do not need to specify whether there is exposure-mediator interactions and their types as that in traditional regression-based methods.

Data from phase-II and III clinical trials of a novel small molecular multi-targeted cancer drug , which is already marketed in China, is used to illustrate the application of the RF-based mediation analysis. We evaluated the mediation effects of some measurements from the blood regular tests, such as platelets, on the progression and death outcome for non-small cell lung cancer patients.

Conclusions are that RF-based methods have their advantages in the mediation analysis.

Monday, October 21, 15:15

Halina Frydman
NYU Stern School of Business
An Ensemble Method for Interval-Censored Time-to-Event Data

Interval-censored data analysis is important in biomedical statistics for any type of time-to-event response where the time of response is not known exactly, but rather only known to occur between two assessment times. Many clinical trials and longitudinal studies generate interval-censored data; one common example occurs in medical studies that entail periodic follow-up. In this paper, we propose a survival forest method for interval-censored data based on the conditional inference framework. We describe how this framework can be adapted to the situation of interval-censored data. We show that the tuning parameters have a non-negligible effect on the survival forest performance and guidance is provided on how to tune the parameters in a data-dependent way to improve the overall performance of the method. Using Monte Carlo simulations, we show that the proposed survival forest is at least as effective as a survival tree method when the underlying model has a tree structure, performs similarly to an interval-censored Cox proportional hazards model when the true relationship is linear, and outperforms the survival tree method and Cox model when the true relationship is nonlinear. We illustrate the application of the method on a breast cancer data.

Thursday, November 28, 11:00

Alberto Cairo
Director of the Visualization Program at the Center for Computational Science, University of Miami
Visualization and Graphic Design for Scientists

When designing a data visualization, showing the data comes first. After all, the main goal of a visualization is letting the reader spot patterns and trends behind numbers. But what if the visualization we design is to be presented to a general audience? In that case we may want to think deeply about visual design elements such as typography, color, composition, and hierarchy. This talk teaches non-designers such as scientists and statisticians how to make our charts, graphs, publications, and conference posters look better.

Thursday, November 28, 15:00

Alberto Cairo
Director of the Visualization Program at the Center for Computational Science, University of Miami
How Charts Lie

We’ve all heard that a picture is worth a thousand words, but what if we don’t understand what we’re looking at?

Charts, infographics, and diagrams are ubiquitous. They are useful because they can reveal patterns and trends hidden behind the numbers we encounter in our lives. Good charts make us smarter—if we know how to read them.

However, they can also deceive us. Charts lie in a variety of ways—displaying incomplete or inaccurate data, suggesting misleading patterns, and concealing uncertainty— or are frequently misunderstood. Many of us are ill-equipped to interpret the visuals that politicians, journalists, advertisers, and even our employers present each day. This talk teaches to not only spot the lies in deceptive visuals, but also to take advantage of good ones.

Map of CSS

You can find CSS next to the Botanical Garden, 5 minutes from Nørreport station.

Meeting room 5.2.46 is the library of the Biostatistics section, located in building 5, 2nd floor, room 46. See the map below for directions inside CSS.