HEADLINE for DCU fans. D.C. United will make the playoffs.
When can a playoff prediction be reliably determined?
I wanted to know if 6 games really was a good indicator of which teams would make the playoffs and if not 6 games then how many? I also wanted to know how stable that number was. I also wanted to know are all the playoff teams in playoff position after 6 games but after 8 games none of them are?
To answer these questions I gathered the data that I could and applied Signal Detection Theory.
These analyses are from the 2011, 2012, and 2013 data. I would have loved to use more except for two things: 1) it’s time consuming and tedious, 2) I’m not sure how useful every year’s data would be. You see from 1996 to 2006 67% to 80% of the teams (8 of 12 or 8 of 10) made the playoffs. Since then the percentage of qualifying teams steadily decreases. Increasing the difficulty of making the playoffs. In the "modern" MLS 53% of the teams make the playoffs. That includes the teams in the single elimination "play-in" game. Adding in data from the early years to determine which teams would make the playoffs probably would not shed light on the current state of the league or on anything. That isn’t to say I wouldn’t like the data back to 2007 but I just don’t have that kind of time to make results tables for every year.
Point ties. When teams were tied on points I used the corresponding year’s tie breakers in order to rank them.
Single Detection Theory is about finding or detecting information. Usually applied to cognitive decision making but I’m applying here to something important. You can correctly detect something (HIT) or incorrectly detect something (FALSE ALARM) or correctly reject something (CORRECT REJECTION) or incorrectly miss something (MISS). Using those four categories the statistics d’ (dee prime), beta, and A’ can all be calculated and used to determine the effectiveness of identifying playoff teams after each round of games.
D prime is about sensitivity or being able to detect the playoff teams. It is the primary statistic of interest. The greater the d’ value the greater the sensitivity, that is to say the greater the separation between the signal (correctly identifying which teams qualify for the playoffs and which do not) and the noise (not correctly identifying the team’s positions).
For those wondering, I did calculate the beta’s, an assessment of bias at it stays near 1 after every round indicating little to no bias. And looking at a the non-parametric version of d’, A’ it too indicates good discrimination.
In 2011 after 5 games the d’ never dropped below 1.0. What does that mean? After 5 games no fewer than 7 teams were correctly identified as making the playoffs. 7 of the 10 playoff teams. That’s 7 HIT. 5 CORRECT REJECTION. 3 MISS. 3 FALSE ALARM. Between games 6 and 11, 8 teams were correctly identified as playoff teams. Meaning after every team played 6 games 8 teams were in the top five of their conference (a playoff position). And after the 7th game. And after the 8th game. And after the 9th game. And after the 10th game. And after the 11th game. After the 12th round of games no less than 9 of the 10 playoff teams were in playoff positions for the rest of the season.
In 2012 it only took until the 4th round of games to hit the point of no return. Again the d’ prime of 1.0 equals 7 of 10 teams correctly identified. Then after every team played 5 games no fewer than 8 teams for the remaining of the season would be in their playoff positions. In fact, for much of the season it was 9 or 10 teams (d’ of 2.5).
2013 has the most instability, if you can call it that. As you can see after 5 games the d’ is below its previous years but by game 10 it’s back up to 1.5 (8 hits). After a brief increase drop to a d’ of 1.0 between game 15 and 20. That 1.0 is still 7 of 10 teams correctly in playoff positions. And then for the final 14 games of the season it’s back to at least 8 of 10 teams. Is 2013 unusual or if I had more data would I look at 2011 and 2012 as the unusual years?
Try as we might we cannot forget 2013 also had an unusually poor team. Yes, unusually. No team in 2011 or 2012 was giving away points like D.C. United was in 2013. That historically bad team possibility impacted variability of when teams picked up their free points. Also, Dallas had their spectacular fall from grace starting the year so well only to collapse at midseason. But those are speculative guesses.
Looking at the combined data from 2011-2013 in the above plot an increasing series of points can be seen. Translating the d’ values into actual number of teams correctly predicting to making the playoffs can be read as anything at 1 equals 7 teams and at 1.5 equals 8 teams.
Now of course the predictive ability increases as the number of games increases. But, to me, it’s how early and consistent the identifiability of playoff teams is. After 6 games and by 10 games eight teams are almost locked. Again, there is very little dropping in and out from multiple teams. Teams generally find their level within the first third of the season and stay there. But you don’t have to take my word for it. Take a look at the heat map I link to below.
The above link is to a google document with 3 sheets. Each sheet has the color coding assessment of each MLS team after each round of games.
Red means the team was in a playoff position and made the playoffs. Blue means the team was not in a playoff position but did make the playoffs. Purple means the team was not in a playoff position and did not make the playoffs. Orange means the team was in a playoff position but did not make the playoffs.
In short, the more red and purple the better identification.
Teams that qualify for the playoffs are settled fairly early in the season. Now that is in terms of playoff qualification and not final conference (or single table) ranking. The early qualified teams do not fluctuate in and out of qualification. This is to say that a team that qualifies in week 6 is likely to stay in that position for the rest of the season. Teams do not go in and out of qualification. The weekly results that we see are often quite misleading. And may be a reason why so many (fans and pundits alike - the good ones at least) examine Point Per Game and not just points. This is particularly highlighted by the fact that I can only just now publish this analysis. Just now have all MLS teams have played at least 10 games. A couple of teams have played 15 games. That spread makes the weekly results and ranking quite deceptive and is full of fluctuation.
Toronto beating Columbus completed their 10th game of the season. Toronto was the last team in the league to play 10 games. And so I ranked the teams based on their points total after each of them played 10 games and predict the 2013 the playoff teams should be:
I started the post by saying D.C. United will make the playoffs when it should actually say D.C. United should make the playoffs based on their season’s results to date. Now, not to be blindly using statistics I can see D.C. United may not keep up their results getting form because it's entirely related to Espindola. That would indicate New York Red Bulls might take their spot. New York is tied on points with Houston but loses out on tie breakers. And since Dallas has already lost their main guy if he doesn't return they may also drop out making way for the Colorado Rapids (who are tied on points with Los Angeles).
MLS Stats after 10 games.
My regular posts look at numbers (TSR, LSR, turnover percentage, passing totals, passing completion) that are predictive of league positions in Europe. Not so much in MLS.
Using a multivariate linear regression for points or MLS rank (single table) the only predictor of merit was average goals per game. And it’s a hell of a predictor. Even in a simple bivariate regression it accounts for between 51 and 57% of the total dependent variable variance. To those not versed in regression knowing the average goals scored per game is a damn good predictor of how many points a team will have. Pretty simple. Teams like Seattle and RSL are averaging over 2 goals per games. D.C. United’s 1.4 goals per game is below the league’s average of 1.6.
On just about every other statistic D.C. United is right at the league mean. Possession, shots per game, number of passes, completion of those number of passes, etc. As is every other team in the league. Except for Sporting Kansas City but that’s not across the board. For example SKC averages 60% possession with the MLS mean being 50%. They are far away the league leader in possession but in terms of shots they average 14.5 whereas the league averages 13. Los Angeles is the league leader in shots per game at 16.9. Back to SKC. They have the best Total Shot Ratio (total shots/ (total shots + shots allowed) at .64 but only average 1.5 goals per game. Seattle has the best goals per game average at 2.2 goals. But those teams are outliers as is Toronto on the other end of the continuum. But most teams, nearly every team is clustered right at the means. There is very little differentiation between the teams. Parity.
This will probably be my final stats post. Mostly, because it's too time consuming and tedious gathering the data. If I had access to the Opta data it might be different but since I'm not a blogger I doubt they'll give me the data. Anyway, season over and D.C. United made the playoffs. Now about that stadium.