*Elo ranking NSFL teams

IthicaHawk · (This post was last modified: 04-21-2020, 10:43 AM by IthicaHawk.)

Introduction

First off, what is Elo? Elo is a rating and ranking system designed for head-to-head matchups. It is designed to take opinion out of the rating process and only measure based on actual results.

It is designed to form a ranking system less influenced by human biases. It is, of course, not free from bias as we have to decide what values form the rating itself.

Elo cannot account for wild changes, if top players in one team all decide to retire between weeks, for example. Unless manual intervention is used.

A raw Elo is a zero-sum system where the winners recieve a boost to their ranking equal to the drop recieved by the losers. However, the system can be modified so underdogs get more credit for a win against a stronger opponent and, likewise, a strong opponent gets a smaller gain for beating a low ranked team.

Every team starts with an Elo rating of 1500. There's no mathematical reason it can't be 1000 or 0 or 540055. But 1500 is a nice number that scales well and it seems to be the norm.

The Basics

First, see the image in Fig 1 below. It shows the stats involved for one particular week of one particular season. In this case, S21 week 7. I will walk you through the various elements of the calculations referencing this image.
[Image: nXI1t8B.png]

Fig 1. S21 Week 7

Update Factor - This is the base value for wins and loses. Before any other factors are included, a team that wins gains 20 points and the other team loses 20 points. Simple. We can
scale this factor for more important games if we want to, for example, the ultimus.

Next, we have the matches themselves. Each team is given their own line, this is purely for simplicities sake. I made the sheet so I simply fill in the opponent of each team, home/away
and how many points that team scored, everything else auto-fills from that.

E(h) - Estimated chance of winning for A team. Originally I had planned on doing 4 rows with Home and Away rather than 8 rows but it worked out simpler this way. E(h) is a
throwback to E(home) and E(away) and I just haven't bothered changing them. This is a % representation of the W(e), win expectancy which I'll explain later.

E(a) - Estimated chance of winning for B team. Opposite of E(h) above.

Elo - Current Elo rating for that team.
Opp Elo - Current Elo rating for the opponents.

Checkvalue - A simple sum and compare of each matchup to catch data entry errors. Helped me a couple of times.

W/L - If the game is a win, this is 1, if a loss, it's -1. Nothing complicated.

D(e) - Difference in Elo between the two teams. In this case, we can see that (in the first row of Fig 1) Hawks have an Elo 16.51 higher than Yeti going into the game.

D(h) - Difference in Elo after Home Field Advantage is taken into account. As it happens, based on Some analysis I did the
Home Field Advantage is strong and present in the NSFL. It works out about 60% of the time,
the home team wins. Based on that the Elo difference is modified by 65 depending on if that
team is home or away. In Fig 1 you can see that due to Yeti being at home, the modified Elo difference is now favours Yeti by 48.49

65 was chosen because, all other things equal, two 1500 Elo teams seperated only by home/away of 65 Elo will result in the home team winning roughly 60% of the time, in line with what we'd expect based on our Home Field Advantage findings.

W(e) - The statistical win probability of that team. Calculated by using the following approximation fomula: W(e) = 1/(10^(D(h)/400)+1)

The next two columns, M(p) and M(e) are factors for the third column M. This is now into calculating the updated Elo based on the results of the game. M is a multiplier that is dependant on the points difference between the two teams. Specifically, M(p) is the natural log of the absolute difference between points(w) and points(l) plus 1. ln(|Points(w) - Points(l)|+1). A tied game would be ln(1) = 0 so no multiplier. Winning by a touchdown would be ln(7+1) = 2.08. This system gives decreasing returns for win point difference. ln(15) = 2.71, ln(22) = 3.09, ln(29) = 3.37 and so on.

M(e), we start with a multiplier of 2.2 and adjust it based on the difference between the two teams Elo values (before home and away modifier is applied). This is shown as 2.2/(2.2+(D(e)/1000)). This causes the multiplier to start at 1 and decrease as the elo ratings get further apart.

M is simply M(e)*M(p), the product of the previous two factors.

K(n) is M times our update value. The winning team recieves a boost to their Elo of a proportion of K(n) while the other team loses the remaining K(n)

New Elo - The old Elo plus 1-W(e)*K(n) or -W(e)* K(n) depending on win or loss.

Diff - The Elo difference after that game compared to before it.

PPS - Predicted points spread. In the NFL, a difference of 26 Elo points is worth about 1 point in the spread. I've used the same value here. Thus, this value is simply the Elo difference (including HFA) divided by 25 and rounded to the nearest 0.5. It's a rough approximation.
APS - The actual point spread after the game is concluded.
D(ps) - The difference between the predicted points spread and actual result.

P(w) - Was that team predicted to win? Based on if E(h) is higher than 50%
W - Did that team win?
P(ac) - Was our prediction accurate?

In terms of prediction accuracy, I've fed in all matches from S18 to S21 and got an average accuracy of 68%, including 79% accuracy in the playoffs.

So now you know how it all works. Lets look at some pretty graphs.

Results

First off, I have included all seasons from S21 back to S18. The more data included, the more accurate this sheet will become. All teams started with an Elo of 1500 at the start of S18 and everything else is calculated from that.

After each season is completed. I regress all teams by 33% towards the 1500 value again. That is to say, teams above 1500 move down by 33% of the difference and teams below 1500 move up by 33% of the difference. This is to account for teams drafting new players and losing vets.

[Image: kHPu6wj.png]

Fig 2. S18 to S21 Elo Rankings

The large bumps in Fig 2 are mostly the 33% normisation after each seaon. The better/worse a team is at the end of the season, the more impactful this is. We can see quite clearly the relative domination of both Wraiths and Otters over the last four seasons along with the rise of Second Line in S21. You can equally see the fall of the Outlaws over the last few seasons from a high of 1631 during S18 down to 1422 going into S22.

[Image: kvK80Je.png]

Fig 3. S21 Elo Rankings

Fig 3 shows a close up of S21 specifically, from initial week 0 rankings through to the end of the Ultimus.

Going into S22 (i.e. after the end-of-season regression) we have the following initial Elo values and rankings:

1 - Second Line - 1,623.43
2 - Otters - 1,576.06
3 - Wraiths - 1,530.21
4 - Hawks - 1,529.01
5 - Yeti - 1,516.74
6 - Copperheads - 1,514.69
7 - Hahlua - 1500
8 - Sailfish - 1500
9 - Butchers - 1,429.89
10 - Outlaws - 1,422.12
11 - SaberCats - 1,415.98
12 - Liberty - 1,402.34

I will look to do some more analysis with these sheets in the coming weeks and hopefully be able to do weekly predictions with predicted points differential.

Disclaimer: I am no stats expert. There may be mistakes, errors, data entry or otherwise. I spotted a number during my sanity checking and this seems to give an output that matches reality somewhat closely.

YoungTB · 04-21-2020, 01:12 PM

Liberty hardstuck bronze /:

IthicaHawk · (This post was last modified: 04-21-2020, 01:50 PM by IthicaHawk.)

(04-21-2020, 10:12 AM)YoungTB Wrote:Liberty hardstuck bronze /:

It has been pretty consistant. Stuck below 1500 for a S18-20 then a bit of a tumble in S21. The streak of losses in S21 cost Liberty 119.98 Elo points.

goodfortunecoffee · 04-21-2020, 02:26 PM

Beautiful! I actually started working on an elo rating exactly like this but realized how different home field advantage was compared to real life and sorta lost interest. Glad someone followed thru!

**37thchamber** · (This post was last modified: 04-22-2020, 05:17 AM by 37thchamber.)

Elo in and of itself is an interesting rating solution, so thumbs up from me.

Though it arguably gets less effective for more complex scoring games like football. For chess it's perfect because there are only 3 possible outcomes, and while you can technically quantify the margin of victory, it is less meaningful than margin of victory in football (for example). So there's a case to be made for adjusting the calculation with margin of victory (as is done in the soccer equivalent)

Not sure whether I would adjust the k factor for playoffs or not (again, this is done in the soccer equivalent), and it's probably best to experiment with that. Same goes for adjusting k factor based on rating (as is done by the USCF, so lower rated players don't get the same rating boost by beating a peer as a grandmaster might).

You've already addressed the home advantage issue (any tweaks to k factor may require re-evaluation here, but at least there is a good base in place), so thats pretty kickass.

Would probably be interesting to track this all the way back to season one, with expansion teams being added with a base rating of 1500 as and when they were created.

Anyway this is all a longwinded way of saying that Elo here would be particularly useful for predicting outcomes (and by extension, running a casino of some sort). I might throw something together with all results in it at some point, and see what I can cook up. Been meaning to do something like this for a while. If you wanna look at that as a larger project, hit me up on the dev/wiki discord server and we can plot something out.

I feel like it would be a nice bit of added value for the league as a whole, personally.

IthicaHawk · 04-22-2020, 05:37 AM

(04-22-2020, 02:15 AM)37thchamber Wrote:Not sure whether I would adjust the k factor for playoffs or not (again, this is done in the soccer equivalent), and it's probably best to experiment with that.

I have the ability to do this, I have experimented with making the K factor 40 for the playoff games (double regular season) to give more weight to those games in particular.

(04-22-2020, 02:15 AM)37thchamber Wrote:Same goes for adjusting k factor based on rating (as is done by the USCF, so lower rated players don't get the same rating boost by beating a peer as a grandmaster might).

This is interesting but I think maybe it would work better if there were a larger number of teams and more games. As it happens, 13 weeks games and 10 (now 12) teams isn't a huge number so I think it would disproportionally effect lower ranked teams. Additionally, having it _not_ scale as you suggest has the effect of slowing gain/loss of Elo for top ranked teams and bottom ranked teams. This prevents it getting out of control. In some versions of the sheet as I worked on the factors you had over 1000+ Elo difference between top and bottom.

(04-22-2020, 02:15 AM)37thchamber Wrote:Would probably be interesting to track this all the way back to season one, with expansion teams being added with a base rating of 1500 as and when they were created.

There's nothing preventing this except the manual data entry taking time. For S22 both new teams have joined with 1500.

(04-22-2020, 02:15 AM)37thchamber Wrote:If you wanna look at that as a larger project, hit me up on the dev/wiki discord server and we can plot something out.

I don't think I'm on that one so just drop me a PM if you have some thoughts