According to protocol

DSTS Two-day meeting

Claus Thorn Ekstrøm

November 12, 2024

The new prayer

 

It’s not me - it’s them

The simple approach

Basic descriptive statistics are used. Statistical analyses are performed in SPSS.

No power analysis was conducted, as this is a retrospective study and thus does not involve the inclusion of new patients.

… but what will they do?

Advanced observational statistical analyses will be employed to explore the impact of various factors on short- and longer-term treatment outcomes […] as described above.

This approach allows for a versatile and comprehensive assessment of data, ensuring the study’s adaptability to diverse analyses.

Statistical analyses will be chosen based on the nature of the variables under investigation, ensuring robustness in the interpretation of results.

[The] study prioritizes ensuring a robust statistical foundation to allow adaptability in the analytical strategy, enabling the exploration of a diverse array of factors influencing treatment outcomes and trends from high-quality real world data.

Clueless

STATISTICAL CONSIDERATIONS

Both quantitative and qualitative data analysis methods will be used, and STATA will be applied within the framework of REDCap and XXX.

Key objectives include analyzing how clinical and genetic data align and where they diverge, assessing the validity of the data, and evaluating potential benefits from rethinking negative results as well as the need for additional clinical examinations derived from [genetic] and clinical assessments.

WTF?

Analysis and statistics

Registration of data will be carefully conducted by trained healthcare professionals. In order to avoid bias, the healthcare professionals will be unaware of possible results from further investigations

The relevant analysis will be performed using approved software, and there will be used statistical modalities as for example sensitivity, specificity and mean values, given with confidence intervals (CI95%).

Why even bother?

Statistical considerations

A detailed statistical analysis plan will be made before any analysis is conducted.

Patients diagnosed with primary or recurrent [disease] in the head and neck area and who were treated with surgery at [some Danish hospital] […] will be included. [The hospital] performs approx. 50-80 […] dissections annually. Thus, between 300-500 patients are expected to be included

Give them what they want

R version 4.1.0 and STATA/S.E. 17.1 will be used. The comparison of variables between groups will be conducted using Student’s t-test, Pearson’s chi-square test, Wilcoxon’s nonparametric rank sum test, Fisher’s exact test, or Cuzick’s nonparametric test for trend when appropriate.

For cross-sectional analyzes, linear and logistic regression models will be applied. Aalen-Johansen survival curves, logarithmic rank test, and Cox proportional hazards regressions will be used in prospective studies to estimate incidence curves and hazard ratios with 95% confidence intervals using left truncation (or delayed entry) and age as timescale; sensitivity analyses will use time-on-study as timescale. Multivariable adjustment will be performed with relevant covariates. When appropriate we will use models specifically designed to handle competing risks, such as the Fine-Gray subdistribution hazard model. Where time-dependent confounding and/or exposure is suspected to be an issue we may include time-varying covariates in cox-analyses or, in the case of treatment-confounder feedback, use more advanced methods, such as marginal structural models with weights estimated by inverse probability weighting. For the Mendelian randomization analyzes, we will additionally use instrumental variable analysis

The “new mantra”

Standard methods will be used for the static [sic] analyses.

Continuous data variables will be presented as a mean ± standard deviation (SD) or a median with an interquartile range (IQR). Categorical data will be presented as a count and a percentage.

Comparisons between groups will be conducted using Student’s t-test or a Mann-Whitney U test for continuous data, and for categorical data, we will use a chi-square test or Fisher’s test, depending on the most appropriate method for the given situation.

To assess the relationship between outcome and calcium score, logistic regression analyses will be used.

The collected data is stored by the responsible chief physician on an encrypted drive and a USB stick.

The USB stick is kept in a locked drawer in a room located in the doctor’s corridor connected to the Cardiology Department at [Some] Hospital, which is otherwise inaccessible to unauthorized individuals and is locked outside of regular working hours

There’s a new kid in town : ML / AI

The AI/ML cop out

We will use AI / ML / DL / LLM to analyze the data and make a prediction model.


ML models are much more complex and flexible than traditional models so sample size evaluations are not possible.

 

The snake oil salesman

… we will use AI to rethink […] screening by integrating automated segmentation of all [areas] in a given image with important systemic risk factors to construct a novel risk-based index score, which can accurately identify patients with present or risk of upcoming progression to disease.

State-of-the-art?

We shall adapt state-of-the-art [machine learning] approaches using different databases to identify relevant pathways and […] networks affected by the […] risk factors as part of the studies.

We will also test the importance of … the findings … and explore causal relationships between risk variants, imaging and disease outcome […], e.g., mendelian randomization and […] Structural Equation Modelling. To develop and test […] scores and scores using clinical variables […] we will use both linear and complex mathematical models including discrete probability distributions.

State-of-the-art?

There will be data that are missing in the datasets […]. In most instances the values will not be missing at random. However, because there are no universal ways to handle missing data, we will handle the missing data differently depending on whether they are missing at random or there are systematic patterns in the data that can be used to predict the missingness.

We will, were applicable, apply deep learning for time to event modeling, e.g., disease progression models, time to first diagnosis or time to hospitalization and surgery. Alternatively, we train an unsupervised deep learning model to encode the patients in an embedding space, in which stratification is possible.

Words matter

And it’s not just words. They are significant.

Go have that difficult talk with your research connections.

DON’T FOOL PEOPLE

Stop fooling yourself (and others).


Put yourself on the line


If you have no clue what you will be doing then say so.

REPRODUCIBILITY

Transparency in the analysis and reporting of results.


Facilitate scientific discussions


What if the researcher dies?