Evaluating sports tournament predictions

class: center, middle, inverse, title-slide

# Evaluating sports<br>tournament predictions
### Claus Thorn Ekstrøm<br>UCPH Biostatistics<br>(joint with C. Ley, H. V. Eetvelde, and U. Brefeld)
### December 15th, 2019<br><svg style="height:0.8em;top:.04em;position:relative;fill:steelblue;" viewBox="0 0 512 512"><path d="M459.37 151.716c.325 4.548.325 9.097.325 13.645 0 138.72-105.583 298.558-298.558 298.558-59.452 0-114.68-17.219-161.137-47.106 8.447.974 16.568 1.299 25.34 1.299 49.055 0 94.213-16.568 130.274-44.832-46.132-.975-84.792-31.188-98.112-72.772 6.498.974 12.995 1.624 19.818 1.624 9.421 0 18.843-1.3 27.614-3.573-48.081-9.747-84.143-51.98-84.143-102.985v-1.299c13.969 7.797 30.214 12.67 47.431 13.319-28.264-18.843-46.781-51.005-46.781-87.391 0-19.492 5.197-37.36 14.294-52.954 51.655 63.675 129.3 105.258 216.365 109.807-1.624-7.797-2.599-15.918-2.599-24.04 0-57.828 46.782-104.934 104.934-104.934 30.213 0 57.502 12.67 76.67 33.137 23.715-4.548 46.456-13.32 66.599-25.34-7.798 24.366-24.366 44.833-46.132 57.827 21.117-2.273 41.584-8.122 60.426-16.243-14.292 20.791-32.161 39.308-52.628 54.253z"/></svg> <span class="citation">@ClausEkstrom</span> <br>.small[<a href="mailto:ekstrom@sund.ku.dk" class="email">ekstrom@sund.ku.dk</a> ]

---

class: animated, fadeIn
layout: true

---

.pull-left[
<img src="pics/pred1.png" width="100%" />
]
--
.pull-right[
<img src="pics/pred2.png" width="1421" />
]

---

# How to compare tournament predictions?

.pull-left[
*   Take **all** predictions into account. Not just the winner.
*   Non-local and sensitive to distance
*   Penalize **confident** classifications that are **incorrect**.
]
.pull-right[
Prediction: 32 x 32 matrix
```
      [,1] [,2] [,3] [,4] 
 [1,] 0.00 0.00 0.00 0.02 
 [2,] 0.00 0.00 0.00 0.04 
 [3,] 0.01 0.01 0.00 0.07 
 [4,] 0.00 0.00 0.01 0.08
```
.small[Rows = ranks, 
columns = teams]
]

???

"It’s better to be somewhat wrong than emphatically wrong. Of course it’s always better to be right"

---

# Single match methods

`$x_j$` is prediction probability of rank `$j$`, `$o_j = I(\text{team ranked } j)$` .

$$
\text{Log-loss} = -\sum_{j=1}^R o_j\log(x_j)
$$

`\begin{equation}
\text{RPS}=\frac{1}{R-1}\sum_{r=1}^{R-1}\left(\sum_{j=1}^r (o_j - x_j)\right)^2
\end{equation}`

Better predictions lead to smaller numbers.

---

# The tournament rank probability score

`$X$` prediction matrix, `$O$` result indicator matrix

`\begin{equation}
\text{TRPS}(O, X) = \frac{1}{T}\sum_{t=1}^{T}\frac{1}{R-1}\sum_{r=1}^{R-1} (\mathcal{O}_{rt} - \mathcal{X}_{rt})^2. 
\end{equation}`

`$\mathcal{O}$` and `$\mathcal{X}$` are the cumulative versions of `$O$` and `$X$` (over ranks).

*   Cumulative differences (obtain at least rank `$r$`)
*   The `$R$` ranks can be collapsed into partial rankings

---

# Example (team ranked 5)

![](cms_files/figure-html/unnamed-chunk-3-1.png)

---

# Examples

`$$X^1 = \left[\begin{array}{cccc}0.75 & 0.25 & 0 & 0 \\ 0.25 & 0.75 & 0 & 0 \\ 0 & 0 & 0.75 & 0.25\\ 0 & 0 & 0.25 & 0.75 \end{array}\right] \; \; X^2 = \left[\begin{array}{cccc}0.25 & 0.25 & 0.25 & 0.25 \\ 0.25 & 0.25 & 0.25 & 0.25 \\ 0.25 & 0.25 & 0.25 & 0.25 \\ 0.25 & 0.25 & 0.25 & 0.25\end{array}\right]$$`

Tournament outcome: .yellow[1, 2, 3, 4]: TRPS becomes 0.0208 and 0.2083.<br>
(Log-loss = 1.15 and 5.55)

Tournament outcome: .yellow[2, 1, 4, 3]: TRPS becomes 0.188 and 0.2083<br>
(Log-loss = 5.55 and 5.55)

---

# Simulations results - TRPS - knockout

| Teams | `$\sigma$` | True | Flat | Confident | `$P(\text{T>F})$` | `$P(\text{T>C})$` | 
|----|----------|------|------|-----------|-------|-------|
| 8 |    | 0.129 `$\pm$` 0.048 | 0.18 | 0.255 `$\pm$` 0.065 | 0.85 | 1.00 | 
| 16 |  1 | 0.111 `$\pm$` 0.026 | 0.15 | 0.188 `$\pm$` 0.059 | 0.92 | 0.98 | 
|  32 |     | 0.089 `$\pm$` 0.019 | 0.13 | 0.165 `$\pm$` 0.03 | 0.96 | 1.00 | 
|   8 |     | 0.083 `$\pm$` 0.066 | 0.18 | 0.204 `$\pm$` 0.062 | 0.91 | 1.00 | 
|  16 |   2 | 0.083 `$\pm$` 0.027 | 0.15 | 0.141 `$\pm$` 0.046 | 0.98 | 1.00 | 
|  32 |     | 0.062 `$\pm$` 0.019 | 0.13 | 0.123 `$\pm$` 0.019 | 1.00 | 1.00 | 
|   8 |     | 0.04 `$\pm$` 0.04 | 0.18 | 0.132 `$\pm$` 0.056 | 0.99 | 0.97 | 
|  16 |   3 | 0.055 `$\pm$` 0.027 | 0.15 | 0.115 `$\pm$` 0.039 | 0.99 | 1.00 | 
|  32 |     | 0.038 `$\pm$` 0.015 | 0.13 | 0.098 `$\pm$` 0.016 | 1.00 | 1.00 |

---

# Extending the TRPS - weigh ranks

Let `$w = (w_1, \ldots, w_{R-1})$` be a vector of non-negative weights that sum to `$R-1$`. The **weighted TRPS** is

`\begin{equation}
\text{wTRPS}(O, X) = \frac{1}{T}\sum_{t=1}^{T}\frac{1}{R-1}\sum_{r=1}^{R-1} w_r (\mathcal{O}_{rt} - \mathcal{X}_{rt})^2
\end{equation}`

*   `$w_r = (R−1, 0, 0, \cdots, 0)$`: only winner interesting
*   `$w_r = (0, 0, 0, \cdots, R-1)$`: getting past group stage  interesting

---

# FIFA predictions

*   Flat prediction. All teams have same probability of each rank.
*   Skellam. `$Y_1 \sim \text{Pois}(\lambda_1), Y_2 \sim \text{Pois}(\lambda_2)$`. `$\lambda$`'s from gambling companies.
*   Bradley-Terry model. Probability that team `$A$` wins over team `$B$` is based on the "official" ELO ranking of the teams.
*   Random forest model using several team-specific covariates (mean age, players from strong clubs, ...).
*   Updated random forest model. As above but also used the ELO rating.

---

# Results

.center[

|                    |  TRPS | wTRPS |  Log loss |
|--------------------|-------|-------|-----------|
|Flat                | 0.120 | 0.214 |  0.455     |
|Skellam   | 0.086 | 0.153 | 0.367    |
|ELO	     | 0.101 | 0.179 | 0.421    |
|Random Forest | 0.089 | 0.157 | 0.365    |
|Updated RF | 0.090 | 0.159 | 0.371    |

]

Weights are proportional to `$1, 1, \frac12, \frac12, \frac14, \frac18, \frac{1}{16}$`

---

# Extending the TRPS - Ensemble predictions

Let `$\tilde{X}^k$` be the prediction from model `$M_k$` at a previous tournament and let `$\widetilde{O}$` be the corresponding outcome. Then

`$$\hat\omega = \text{arg min}_{\omega_1, \ldots, \omega_K; \sum_{k=1}^K \omega_k=1} \; \text{TRPS}(\widetilde{O}, \sum_{k=1}^K  \omega_k \tilde{X}^k)$$`

gives optimal weights.  An **ensemble** prediction would be

$$
X^\star = \sum_{k=1}^K \hat\omega_k X^k
$$

---

# Summary

* General measure to evaluate and compare "one-shot" tournament predictions
* Non-local and sensitive to distance
* Can handle partial rankings
* Can be combined with Bayesian model averaging to create ensemble predictions
* Can change focus through the rank weights

Participate in the UEFA Euro 2020 prediction competition!