Calcultaing xG in Football Matches

Specialeforsvar: Bertram Rølmergaard Hansen

Titel:  Calcultaing xG in Football Matches
Comparing Logistical Regression, LightGBM and XGBoost

Abstract: This thesis presents a study on the application of Expected Goals (xG) models in football analytics, focusing on quantifying and predicting goal-scoring opportunities. Utilizing data from various leagues, we implemented and compared the performance of Logistic Regression, Gradient Boosting (LightGBM), and XGBoost models. These models were trained on two
distinct datasets, both incorporating shot distance, angle, type of pass, and defensive pressure, to estimate the probability of a shot resulting in a goal.
Our findings reveal that while Logistic Regression provides simplicity and interpretability, its predictive performance is outperformed by more complex models like LightGBM and XGBoost, which effectively capture non-linear relationships and interactions within the data.
Feature engineering was crucial in enhancing model accuracy, with contextual data offering significant insights into goal-scoring dynamics. Additionally, we utilized xG values from the best-performing model to simulate match outcomes and estimate the probabilities of final league standings. Through extensive simulations, we demonstrated the practical applications of xG models in predicting league results and identifying discrepancies between expected and actual performances.
The thesis also highlights the limitations of xG as a concept and suggests improvements. For example, the models use data from a few selected seasons, as these are the only seasons where data is available.

Vejleder: Rolf Poulsen
Censor:   Kim Christensen, Aarhus Universitet