class: center, middle, inverse, title-slide # Evaluating sports
tournament predictions ### Claus Thorn Ekstrøm
UCPH Biostatistics
(joint with C. Ley, H. V. Eetvelde, and U. Brefeld) ### December 15th, 2019
@ClausEkstrom
.small[
ekstrom@sund.ku.dk
] --- class: animated, fadeIn layout: true --- .pull-left[ <img src="pics/pred1.png" width="100%" /> ] -- .pull-right[ <img src="pics/pred2.png" width="1421" /> ] --- # How to compare tournament predictions? .pull-left[ * Take **all** predictions into account. Not just the winner. * Non-local and sensitive to distance * Penalize **confident** classifications that are **incorrect**. ] .pull-right[ Prediction: 32 x 32 matrix ``` [,1] [,2] [,3] [,4] [1,] 0.00 0.00 0.00 0.02 [2,] 0.00 0.00 0.00 0.04 [3,] 0.01 0.01 0.00 0.07 [4,] 0.00 0.00 0.01 0.08 ``` .small[Rows = ranks, columns = teams] ] ??? "It’s better to be somewhat wrong than emphatically wrong. Of course it’s always better to be right" --- # Single match methods `\(x_j\)` is prediction probability of rank `\(j\)`, `\(o_j = I(\text{team ranked } j)\)` . $$ \text{Log-loss} = -\sum_{j=1}^R o_j\log(x_j) $$ `\begin{equation} \text{RPS}=\frac{1}{R-1}\sum_{r=1}^{R-1}\left(\sum_{j=1}^r (o_j - x_j)\right)^2 \end{equation}` Better predictions lead to smaller numbers. --- # The tournament rank probability score `\(X\)` prediction matrix, `\(O\)` result indicator matrix `\begin{equation} \text{TRPS}(O, X) = \frac{1}{T}\sum_{t=1}^{T}\frac{1}{R-1}\sum_{r=1}^{R-1} (\mathcal{O}_{rt} - \mathcal{X}_{rt})^2. \end{equation}` `\(\mathcal{O}\)` and `\(\mathcal{X}\)` are the cumulative versions of `\(O\)` and `\(X\)` (over ranks). * Cumulative differences (obtain at least rank `\(r\)`) * The `\(R\)` ranks can be collapsed into partial rankings --- # Example (team ranked 5) <!-- --> --- # Examples `$$X^1 = \left[\begin{array}{cccc}0.75 & 0.25 & 0 & 0 \\ 0.25 & 0.75 & 0 & 0 \\ 0 & 0 & 0.75 & 0.25\\ 0 & 0 & 0.25 & 0.75 \end{array}\right] \; \; X^2 = \left[\begin{array}{cccc}0.25 & 0.25 & 0.25 & 0.25 \\ 0.25 & 0.25 & 0.25 & 0.25 \\ 0.25 & 0.25 & 0.25 & 0.25 \\ 0.25 & 0.25 & 0.25 & 0.25\end{array}\right]$$` Tournament outcome: .yellow[1, 2, 3, 4]: TRPS becomes 0.0208 and 0.2083.<br> (Log-loss = 1.15 and 5.55) -- Tournament outcome: .yellow[2, 1, 4, 3]: TRPS becomes 0.188 and 0.2083<br> (Log-loss = 5.55 and 5.55) --- # Simulations results - TRPS - knockout | Teams | `\(\sigma\)` | True | Flat | Confident | `\(P(\text{T>F})\)` | `\(P(\text{T>C})\)` | |----|----------|------|------|-----------|-------|-------| | 8 | | 0.129 `\(\pm\)` 0.048 | 0.18 | 0.255 `\(\pm\)` 0.065 | 0.85 | 1.00 | | 16 | 1 | 0.111 `\(\pm\)` 0.026 | 0.15 | 0.188 `\(\pm\)` 0.059 | 0.92 | 0.98 | | 32 | | 0.089 `\(\pm\)` 0.019 | 0.13 | 0.165 `\(\pm\)` 0.03 | 0.96 | 1.00 | | 8 | | 0.083 `\(\pm\)` 0.066 | 0.18 | 0.204 `\(\pm\)` 0.062 | 0.91 | 1.00 | | 16 | 2 | 0.083 `\(\pm\)` 0.027 | 0.15 | 0.141 `\(\pm\)` 0.046 | 0.98 | 1.00 | | 32 | | 0.062 `\(\pm\)` 0.019 | 0.13 | 0.123 `\(\pm\)` 0.019 | 1.00 | 1.00 | | 8 | | 0.04 `\(\pm\)` 0.04 | 0.18 | 0.132 `\(\pm\)` 0.056 | 0.99 | 0.97 | | 16 | 3 | 0.055 `\(\pm\)` 0.027 | 0.15 | 0.115 `\(\pm\)` 0.039 | 0.99 | 1.00 | | 32 | | 0.038 `\(\pm\)` 0.015 | 0.13 | 0.098 `\(\pm\)` 0.016 | 1.00 | 1.00 | --- # Extending the TRPS - weigh ranks Let `\(w = (w_1, \ldots, w_{R-1})\)` be a vector of non-negative weights that sum to `\(R-1\)`. The **weighted TRPS** is `\begin{equation} \text{wTRPS}(O, X) = \frac{1}{T}\sum_{t=1}^{T}\frac{1}{R-1}\sum_{r=1}^{R-1} w_r (\mathcal{O}_{rt} - \mathcal{X}_{rt})^2 \end{equation}` * `\(w_r = (R−1, 0, 0, \cdots, 0)\)`: only winner interesting * `\(w_r = (0, 0, 0, \cdots, R-1)\)`: getting past group stage interesting --- # FIFA predictions * Flat prediction. All teams have same probability of each rank. * Skellam. `\(Y_1 \sim \text{Pois}(\lambda_1), Y_2 \sim \text{Pois}(\lambda_2)\)`. `\(\lambda\)`'s from gambling companies. * Bradley-Terry model. Probability that team `\(A\)` wins over team `\(B\)` is based on the "official" ELO ranking of the teams. * Random forest model using several team-specific covariates (mean age, players from strong clubs, ...). * Updated random forest model. As above but also used the ELO rating. --- # Results .center[ | | TRPS | wTRPS | Log loss | |--------------------|-------|-------|-----------| |Flat | 0.120 | 0.214 | 0.455 | |Skellam | 0.086 | 0.153 | 0.367 | |ELO | 0.101 | 0.179 | 0.421 | |Random Forest | 0.089 | 0.157 | 0.365 | |Updated RF | 0.090 | 0.159 | 0.371 | ] Weights are proportional to `\(1, 1, \frac12, \frac12, \frac14, \frac18, \frac{1}{16}\)` --- # Extending the TRPS - Ensemble predictions Let `\(\tilde{X}^k\)` be the prediction from model `\(M_k\)` at a previous tournament and let `\(\widetilde{O}\)` be the corresponding outcome. Then `$$\hat\omega = \text{arg min}_{\omega_1, \ldots, \omega_K; \sum_{k=1}^K \omega_k=1} \; \text{TRPS}(\widetilde{O}, \sum_{k=1}^K \omega_k \tilde{X}^k)$$` gives optimal weights. An **ensemble** prediction would be $$ X^\star = \sum_{k=1}^K \hat\omega_k X^k $$ --- # Summary * General measure to evaluate and compare "one-shot" tournament predictions * Non-local and sensitive to distance * Can handle partial rankings * Can be combined with Bayesian model averaging to create ensemble predictions * Can change focus through the rank weights Participate in the UEFA Euro 2020 prediction competition!