About Premier League Prediction

This is a machine learning model that forecasts the final standings of the English Premier League based on historical data. The data is sourced from football-data.org. This project was inspired by the now-defunct FiveThirtyEight club soccer predictions.

Model

Methodology

The model uses an Elo-based system to predict the final league standings. The Elo rating system is a method for calculating the relative skill levels of players in two-player games such as chess. In this case, the Elo rating system is used to measure the relative strength of each Premier League club and uses this information to assess how likely a result is in a given match.

Elo Calculation

Before the season, each club is assigned an Elo rating based on their performance in the previous season. The Elo ratings of all clubs are adjusted based on club value.

Newly promoted teams receive the maximum Elo of the relegated teams. These values are then adjusted based on club value.

Clubs retain 50% of their Elo rating from the previous season. This attempts to implicitly factor out the adjustment for club value in the previous season. 1500 is the average Elo rating in this model.

Club values are then used to adjust the Elo rating. The club value adjustment is normalized for each club based on the maximum and minimum club values. As the best and richest teams win more often, the normalized club values are exponentially adjusted. Club values are factored into the Elo rating as follows:

The adjustment factor is currently set to 300, meaning the best clubs get a 300 point bump to their Elo rating.

Elo Updates

As the season progresses, the Elo rating of each club is updated based on the outcome of each match. The model uses the following formulas to calculate Elo rating for each match:

Win/Lose
Draw
Decay

The Elo rating of each club has a half-life of 1/4 of the season. This ensures that the most recent matches have the most impact on a team's Elo rating.

Model Architecture

The model is trained on the following data:

  • Elo
  • Table Position
  • Manager Games in Charge
  • Recent Form

which is compared to the actual outcome of the match. A Random Forest Classifier is trained on the past two seasons and is used to predict the outcome of each match.

Forecasting

The model generates a forecast before the start of each new match week where the model is making predictions based on by the knowledge of the results from previous match weeks. Before running the forecast, the model processes current form, manager tenure, and position in the league table for each Premier League club.

The forecast simulates the current season 10,000 to determine a distribute of where each team is likely to finish in the final league table.

Other Ideas

It would be great to incorporate goals into the model. Right now the model has no concept of a margin of victory. More seasons of data could also be used to train the model. At the beginning of the season, the model looks too much like a table of the clubs market value.

Model Revisions

VersionDateChanges
1.02024-08-16Initial Elo-based model