About Premier League Prediction
This is a machine learning model that forecasts the final standings of the English Premier League based on historical data. The data is sourced from football-data.org. This project was inspired by the now-defunct FiveThirtyEight club soccer predictions.
Model
Methodology
The model uses an Elo-based system to predict the final league standings. The Elo rating system is a method for calculating the relative skill levels of players in two-player games such as chess. In this case, the Elo rating system is used to measure the relative strength of each Premier League club and uses this information to assess how likely a result is in a given match.
Elo Calculation
Before the season, each club is assigned an Elo rating based on their performance in the previous season. The Elo ratings of all clubs are adjusted based on club value.
Newly promoted teams receive the maximum Elo of the relegated teams. These values are then adjusted based on club value.
Clubs retain 50% of their Elo rating from the previous season. This attempts to implicitly factor out the adjustment for club value in the previous season. 1500
is the average Elo rating in this model.
Club values are then used to adjust the Elo rating. The club value adjustment is normalized for each club based on the maximum and minimum club values. As the best and richest teams win more often, the normalized club values are exponentially adjusted. Club values are factored into the Elo rating as follows:
The adjustment factor is currently set to 300, meaning the best clubs get a 300 point bump to their Elo rating.
Elo Updates
As the season progresses, the Elo rating of each club is updated based on the outcome of each match. The model uses the following formulas to calculate Elo rating for each match:
Win/Lose
Draw
Decay
The Elo rating of each club has a half-life of 1/4 of the season. This ensures that the most recent matches have the most impact on a team's Elo rating.
Model Architecture
The model is trained on the following data:
- Elo
- Table Position
- Manager Games in Charge
- Recent Form
which is compared to the actual outcome of the match. A Random Forest Classifier is trained on the past two seasons and is used to predict the outcome of each match.
Forecasting
The model generates a forecast before the start of each new match week where the model is making predictions based on by the knowledge of the results from previous match weeks. Before running the forecast, the model processes current form, manager tenure, and position in the league table for each Premier League club.
The forecast simulates the current season 10,000 to determine a distribute of where each team is likely to finish in the final league table.
Computing Infrastructure
This model is deployed on the AWS cloud using ECS, Fargate, and S3. Docker containers are used to manage the model and the data pipeline. The model is run on a schedule based on the Premier League fixture list which is updated every time the model is run. EventBridge is used to trigger the model runs.
Other Ideas
The next big step is incorporating goals into the model. Right now the model has no concept of the margin of victory. More seasons of data could also be used to train the model. At the beginning of the season, the model looks too much like a table of the clubs market value.
Model Revisions
Version | Date | Changes |
---|---|---|
1.0 | 2024-08-16 | Initial Elo-based model |