Processing math: 100%
+ - 0:00:00
Notes for current slide
Notes for next slide

The war against p values in medical research

Claus Thorn Ekstrøm and Theis Lange
UCPH Biostatistics
/
@ClausEkstrom / @GetTheisOnTwitter

IFSV Sept. 19th 2019
Slides: biostatistics.dk/talks/

1

Quiz

2

You want to see if the means of two groups are different. You compare the means statistically and get a p value of 0.05 when testing at a significance level of 0.07. What is the conclusion?

  1. You reject the null hypothesis.
    Thus you cannot reject that the two population means are the same.
  2. You fail to reject the null hypothesis.
    Thus you cannot reject that the two population means are the same.
  3. You reject the null hypothesis.
    Thus you reject that the two population means are the same.
  4. You fail to reject the null hypothesis.
    Thus you reject that the two population means are the same.
  5. Help!!
3

Exercise

4

What is a p value anyway?

The p value is the probability of having obtained a result at least as extreme as the one found with our sample if the null hypothesis were true. --- Kirkwood & Sterne

5

What is a p value anyway?

The p value is the probability of having obtained a result at least as extreme as the one found with our sample if the null hypothesis were true. --- Kirkwood & Sterne

IF the null hypothesis is true
AND all the other assumptions about the model are also true
THEN the p value expresses the probability of observing something as extreme as what you have in your sample.

5

If A is TRUE then B cannot occur; However, B has occurred; Therefore A is false

If A is TRUE then B probably cannot occur; However, B has occurred; Therefore A is probably false

What is a p value anyway?

The p value is the probability of having obtained a result at least as extreme as the one found with our sample if the null hypothesis were true. --- Kirkwood & Sterne

IF the null hypothesis is true
AND all the other assumptions about the model are also true
THEN the p value expresses the probability of observing something as extreme as what you have in your sample.

Roughly: the p value is a number that measures how surprised you are.

6

The epistemology of science

7
8

A history of the (war against) p values

9

From ancient times ...

10

Popper did not like the probability argument

... to recent times

11

What are the problems with p values?

They try to answer the "wrong" question

A researcher typically wants to know if the hypothesis holds:

P(H|D)

but the p value computes

P(D "or more extreme" |H)

12

They give a very precise answer to the wrong question instead of an approxiomate answer to the right question.

The p value is used in the wrong way

Typically used as a decision rule:

p value{<0.05reject"significant"0.05not reject"not significant" or "no association"

  • Arbitrary threshold for continuous scale
  • Significant does not mean clinically relevant
  • Non-significance does not mean that H0 is true - only that there was insufficient evidence to reject it ("absence of evidence is not evidence of absence").
13

"No association" is wrong to say

binary thinking makes everything worse in that people inappropriately combine probabilistic statements with Boolean rules.

The p value contains two types of information

The p value combines information about the effect size and sample size.

When N everything becomes significant.

14

Unrealistic null hypothesis

Compare two treatments with effects μ1 and μ2

H0:μ1=μ2

When do we really believe that the effects of two treatments are exactly the same?

Hard to believe in most public health or social science research.

15

At least outside randomization

16
17

Alternative proposals

18

Use confidence intervals

The CI is defined as the values of H0 that are not rejected.

Fully defined from (infinitely many) p values

19

... interpretation of the CI

20

... interpretation of the CI

Epidemiologists: interpret confidence intervals as credible intervals.

Biostatisticians: Know that CIs are not credible intervals, but interpret them as though they were anyway.

20

Bayes factors

The Bayes factor is the ratio of the likelihood of two hypotheses:

BF=P(D|H1)P(D|H0)

Move problem to another scale!

Several other problems.

21

If you think p values are problematic, wait until you understand Bayes facts

depend crucially on aspects of the prior distribution that are typically assigned in a completely arbitrary manner by users.

IF B10 IS… THEN YOU HAVE…

100 Extreme evidence for H1 30 – 100 Very strong evidence for H1 10 – 30 Strong evidence for H1 3 – 10 Moderate evidence for H1 1 – 3 Anecdotal evidence for H1 1 No evidence 1/3 – 1 Anecdotal evidence for H1 1/3 – 1/10 Moderate evidence for H1 1/10 – 1/30 Strong evidence for H1 1/30 – 1/100 Very strong evidence for H1 < 1/100 Extreme evidence for H1

Lower the significance level

Pros:

  • Fewer false positives
  • Improve replicability

Cons:

  • More false negatives
  • Still dichotomizes results
  • Does not fix any of the conceptual problems with the p value

Use α=0.005 instead of 0.05.

22

Bayesian analysis

Answers the "right" question:

What is the probability that my hypothesis holds?

P(H|D)

  • Subjective vs objective
  • Moves discussion to priors

Posterior distribution of θ

23
24

Let's put things into perspective

25

Let's put things into perspective

  • Which variables?
  • How to measure?
  • Missing data
  • Entry errors
  • Which model?
  • Which specification?
  • Which assumptions?
  • Confounding
  • Collinearity
  • Overfitting
  • p hacking
  • Interpretation
  • Published?
  • Replicated?
25

Publication bias

Scientist Salesman
Be sceptical of your results "Sell" your results
Interpret conclusions carefully Highlight / exaggerate importance
"Publish" negative results Publish strategically
Replicate replicate replicate Replicate ... if you must
Novel exciting results are less likely to be true Publish novel results before they get scooped

26

What about the future?

27

Are p values bad?

  • No
28

Are p values bad?

  • No
  • Medicine / public health has moved forward in leaps and bounds in the last 100 years.
28

Are p values bad?

  • No
  • Medicine / public health has moved forward in leaps and bounds in the last 100 years.
  • "Assessing the Statistical Analyses Used in Basic and Applied Social Psychology After Their p-Value Ban" (2019)
28

Are p values bad?

  • No
  • Medicine / public health has moved forward in leaps and bounds in the last 100 years.
  • "Assessing the Statistical Analyses Used in Basic and Applied Social Psychology After Their p-Value Ban" (2019)
  • The function of significance tests is to prevent you from making a fool of yourself, and not to make unpublishable results publishable
28

If a drunken driver crashes into a tree it is not the cars fault (at least not yet).

31 BASP papers. 17 with some statistics

The sceptical p value

Held (2019): The sceptical p value. Focus on statistical evaluation of replication studies.

29

Recommendations

Embrace uncertainty!

Know your tools!

  • Report effect sizes and CIs (and perhaps p values)
  • Put as much energy into discussing clinical relevance as statistical results.
  • Abandon dichotomizing and "statistically significant"
  • Never conclude: "no difference" eller "no effect"
30

Present statistical conclusions with uncertainty rather than as dichotomies

March 2019, 800 scientists har skrevet under. 1: never conclude: no difference eller no association 2: abandon dichotomizing and "statistically significant"

Who said this?

Some people hate the very name of statistics, but I find them full of beauty and interest. Whenever they are not brutalized, but delicately handled by the higher methods, and are warily interpreted, their power of dealing with complicated phenomena is extraordinary. They are the only tools by which an opening can be cut through the formidable thicket of difficulties that bars the path of those who pursue the Science of man.

31

Who said this?

Some people hate the very name of statistics, but I find them full of beauty and interest. Whenever they are not brutalized, but delicately handled by the higher methods, and are warily interpreted, their power of dealing with complicated phenomena is extraordinary. They are the only tools by which an opening can be cut through the formidable thicket of difficulties that bars the path of those who pursue the Science of man.

- Francis Galton (1894)

31

We can provide the best methods possible. it is up to the researcher to apply them (delicately handled) and appropirately decipher the results "warily interpreted"

33

Quiz

2
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow