You want to see if the means of two groups are different. You compare the means statistically and get a p value of 0.05 when testing at a significance level of 0.07. What is the conclusion?
The p value is the probability of having obtained a result at least as extreme as the one found with our sample if the null hypothesis were true. --- Kirkwood & Sterne
The p value is the probability of having obtained a result at least as extreme as the one found with our sample if the null hypothesis were true. --- Kirkwood & Sterne
IF the null hypothesis is true
AND all the other assumptions about the model are also true
THEN the p value expresses the probability of observing something as extreme as what you have in your sample.
If A is TRUE then B cannot occur; However, B has occurred; Therefore A is false
If A is TRUE then B probably cannot occur; However, B has occurred; Therefore A is probably false
The p value is the probability of having obtained a result at least as extreme as the one found with our sample if the null hypothesis were true. --- Kirkwood & Sterne
IF the null hypothesis is true
AND all the other assumptions about the model are also true
THEN the p value expresses the probability of observing something as extreme as what you have in your sample.
Roughly: the p value is a number that measures how surprised you are.
Popper did not like the probability argument
They try to answer the "wrong" question
A researcher typically wants to know if the hypothesis holds:
P(H|D)
but the p value computes
P(D "or more extreme" |H)
They give a very precise answer to the wrong question instead of an approxiomate answer to the right question.
Typically used as a decision rule:
p value{<0.05reject−"significant"≥0.05not reject−"not significant" or "no association"
"No association" is wrong to say
binary thinking makes everything worse in that people inappropriately combine probabilistic statements with Boolean rules.
The p value combines information about the effect size and sample size.
When N→∞ everything becomes significant.
Compare two treatments with effects μ1 and μ2
H0:μ1=μ2
When do we really believe that the effects of two treatments are exactly the same?
Hard to believe in most public health or social science research.
At least outside randomization
The CI is defined as the values of H0 that are not rejected.
Fully defined from (infinitely many) p values
Epidemiologists: interpret confidence intervals as credible intervals.
Biostatisticians: Know that CIs are not credible intervals, but interpret them as though they were anyway.
The Bayes factor is the ratio of the likelihood of two hypotheses:
BF=P(D|H1)P(D|H0)
Move problem to another scale!
Several other problems.
If you think p values are problematic, wait until you understand Bayes facts
depend crucially on aspects of the prior distribution that are typically assigned in a completely arbitrary manner by users.
IF B10 IS… THEN YOU HAVE…
100 Extreme evidence for H1 30 – 100 Very strong evidence for H1 10 – 30 Strong evidence for H1 3 – 10 Moderate evidence for H1 1 – 3 Anecdotal evidence for H1 1 No evidence 1/3 – 1 Anecdotal evidence for H1 1/3 – 1/10 Moderate evidence for H1 1/10 – 1/30 Strong evidence for H1 1/30 – 1/100 Very strong evidence for H1 < 1/100 Extreme evidence for H1
Pros:
Cons:
Use α=0.005 instead of 0.05.
Answers the "right" question:
What is the probability that my hypothesis holds?
P(H|D)
Posterior distribution of θ
Scientist | Salesman |
---|---|
Be sceptical of your results | "Sell" your results |
Interpret conclusions carefully | Highlight / exaggerate importance |
"Publish" negative results | Publish strategically |
Replicate replicate replicate | Replicate ... if you must |
Novel exciting results are less likely to be true | Publish novel results before they get scooped |
If a drunken driver crashes into a tree it is not the cars fault (at least not yet).
31 BASP papers. 17 with some statistics
Held (2019): The sceptical p value. Focus on statistical evaluation of replication studies.
Embrace uncertainty!
Know your tools!
Present statistical conclusions with uncertainty rather than as dichotomies
March 2019, 800 scientists har skrevet under. 1: never conclude: no difference eller no association 2: abandon dichotomizing and "statistically significant"
Some people hate the very name of statistics, but I find them full of beauty and interest. Whenever they are not brutalized, but delicately handled by the higher methods, and are warily interpreted, their power of dealing with complicated phenomena is extraordinary. They are the only tools by which an opening can be cut through the formidable thicket of difficulties that bars the path of those who pursue the Science of man.
Some people hate the very name of statistics, but I find them full of beauty and interest. Whenever they are not brutalized, but delicately handled by the higher methods, and are warily interpreted, their power of dealing with complicated phenomena is extraordinary. They are the only tools by which an opening can be cut through the formidable thicket of difficulties that bars the path of those who pursue the Science of man.
- Francis Galton (1894)
We can provide the best methods possible. it is up to the researcher to apply them (delicately handled) and appropirately decipher the results "warily interpreted"
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |