class: center, middle, inverse, title-slide # Seven deadly sins of data science ### Claus Thorn Ekstrøm
] ### May 30th, 2018
] --- # A long time ago ... > *I'd like to convince you to give a talk [...]* > > *It's at the end of May. A rather big conference. Check this out: https://intelligentcloud.dk* -- > > *They could use a **"grumpy professor's"** advice on good basic statistics :-)* --- class: center, middle .yellow[.large[**A NEW HOPE**]] .Large[data].HUGE[Science] ??? Digitalization - Paradigm shift in AI (statistical learning) - Near human or superhuman performance in image and sound recognition, and text processing - Automatization of decision processes (rebranding old ideas as AI fuelled by the increase in computing power) - xxx - ... and the naive hope that more data will make difficult problems easy --- # What makes a good scientist? Be curious ... keep learning new ... remember collaborative effort -- | Scientist | Seller | |:-----------|:----------| | Be sceptical of your results | "Sell" your results | | Interpret conclusions carefully | Highlight/exaggerate importance | | "Publish" negative results | Publish strategically | | Replicate replicate replicate | Replicate ... if you must | | Novel exciting results are less likely to be true. Double check them | Publish novel results before they get scooped | --- class: inverse, center, middle .Huge[What is the question?] --- class: center !(ic_files/figure-html/unnamed-chunk-1-1.svg)<!-- --> --- class: center !(ic_files/figure-html/unnamed-chunk-2-1.svg)<!-- --> --- .pull-left[ `\(p\)`-value hacking Cluster analysis Cherry picking Network analysis Marketing .yellow[Use recommendations from pharma industry] ] .pull-right[ <img src="pics/cluster.png" width="2277" /> ] --- class: inverse, center, middle .Huge[Representativity] --- ## Population and sample !(ic_files/figure-html/unnamed-chunk-4-1.png)<!-- --> Generalization and external validity --- background-image: url(pics/mm.png) background-size: 120% ??? Guardian, May 24th --- # Global Drug Survey `\(N \approx 140000\)` globally, `\(N \approx 13500\)` in DK Sampling: volunteers from facebook, reddit, twitter, partners. *Their statements:* Can **not** be used to say anything about drug use prevalence. *Can* be used to say something about the *patterns*. .yellow[In DK: "Easier to get cocaine than a pizza"] --- background-image: url("pics/mushr.jpeg") background-size: 100% --- # Global Drug Survey > *"Magic mushrooms are one of the safest drugs in the world," said Adam Winstock, [...] pointing out that the bigger risk was people picking and eating the wrong mushrooms.* -- > *"**Death from toxicity** is almost unheard of with poisoning with more dangerous fungi being a much greater risk in terms of serious harms."* --- class: inverse, center, middle .Huge[Confounding] --- # What is confounding? > Confounding is when an association is **distorted** due to a mix-up with other factors that are associated with the outcome and exposure.