Context
In recent years statisticians and data scientists alike have been trying to come up with new ways to evaluate team performance in Football. Sometimes a result is not a fair reflection on a teams performance, and this is where expected goals come in.
Expected goals is a relatively new football metric, using quality of passing and goalscoring opportunities to rank a teams performance. Understat.com provides these statistics by using neural networks to approximate this data and I have therefore scraped statistics for matches played between the 2014-15 and 2019-2020 seasons to provide the following dataset.
The Leagues included in this representation are:
English Premier League
La Liga
Bundesliga
Serie A
Ligue 1
Russian Football Premier League
Content
The dataset contains 22 columns, a lot of which will be self explanatory such as date, home team etc. Some of the less common features will be outlined below:
Chance - the percentage prediction of an outcome based on expected goals.
Expected Goals - the number of goals a team is expected to score based on performance.
Deep - number of passes completed within an estimated 20 yards from goal.
PPDA - number of passes allowed per defensive action in the opposition half.
Expected Points - number of points a team is expected to achieve in this game.
Inspiration
Is the expected goals feature an accurate representation of a teams performance?
How can this feature be improved?
Can we predict the outcome of future games based on previous games?