Since there's not Big Ten 2012 preview this week, I thought I'd write something else that had been on my mind. The gap in previews gives me a perfect opportunity for this, so thanks editors! And here we go...
If you're anything like me (and there's only one of me, so stop trying*), you would want more than a supposition to give credence to a decision. I've previously given my rational for my rating system and its various methodologies, and I've alluded to why I made those decisions. For the most part, I have made methodology decisions based on gut. However, I've made efforts to make sure that those are the right decisions, and plenty of times I have discarded bright ideas because they didn't hold up with the data. So this is my attempt to illuminate to y'all why I made those decisions. (whether or not you read this is immaterial. It's really just to make sure that it's open to those who want to view it). In no particular order...
*my twin is nothing like me
Adj Marg vs. Adj Eff
I've previously explained about my Adj Off and Adj Def ratings. [LINKY HERE] If you're not familiar, view that link. I came up with two rating systems from those, both with their own benefits. Adj Eff (which is better for defense) and Adj Marg (which is better for offense). I said that I prefer Adj Marg, giving anecdotal evidence with Auburn's 2010 season. But anecdotes matter not. Fuck them. Here's some better evidence:
I have game scores (and thus: rankings!) from 1998-2011. (2002 and earlier don't have overtime corrections applied, but that really doesn't impact the evaluations that much). Regardless, I'll share the data I have from 2005-2011 (the standard interval for numerical rating systems nowadays, if other internet sites are to be believed). The table shows how well each set of rankings is at "predicting" the winner of a game played in 2011 (if Team A is ranked above Team B, Team A should have beaten Team B when they played). Close Games are defined as games that ended with a MOV of < 7 pts.
The better "predictor" (really evaluator, those are retro-dictive percentages) is highlighted in green. Years where the two ratings (Adj Eff and Adj Marg) are tied, the percentages are italicized.
Regardless of all that, pretty consistently the best rating for "predicting" game results has been Adj Marg. (and this extends back to 1998 as well, though I haven't shared that.) It's not a large difference, true. But the trend can't be denied. Even though Adj Eff may be better in 2010, I think that Adj Marg is the best from year to year and I'm sticking to it. FINAL ANSWER.
No rating system or poll can avoid ranking violations, much as we would like it to be true. The amount of people bitching about Clemson being ranked behind Virginia Tech last year means that many people have yet to grasp this fact. Meaning: Team A ranks lower than Team B, but Team A beat Team B. ¿Que es la fvck, right? The transitive property doesn't carry perfectly (Big Ten example: Michigan beat Minnesota beat Iowa beat Michigan) and upsets happen, learn to live with it.
I've already noted above how my system performs at this (~80%, so ~20% ranking violation*). This is actually pretty decent (most rating systems are within the range 15%-20%. But don't take my word for it! Kenneth Massey has a beautiful webpage where he compares pretty much all semi-regarded College Football Rankings against each other [LINKY HERE] (NOTE: large webpage with lots of numbers)
For the systems I know well (FEI and S&P+) and in the year 2011, I stack up pretty well.
Adj Marg: 19.5%
WOO, I'M BETTER THAN BILL CONNELLY. SUCK IT BILL.
Okay, now that I got that out of my system: not quite (Bill C. seems like a wonderful person from what I've read). That is encouraging, and the single biggest tool I use for evaluating tweaks to my system is the retro-dictive percentages. Except... okay, we have to take a trip down Definition Lane. Rating systems are pretty much divided into two methods:
Evaluative: look at what a team has done, and decide what rank best fits that data (what a team "deserves"). The BCS Computer rankings are a pretty good example of this.
Predictive: look at what a team has done, and decide what rank best fits future data (how good a team actually is). KenPom [LINKY] (albeit not football) is a good example of this.
People generally look more favorable at Evaluative Rankings, but honestly: they're kind of stupid. Most legitimate systems go the Predictive Rankings route (in my opinion). It helps prevent things like Hawaii being taken seriously (64th in 2007). Unfortunately, that also leads to things like a 5-6 Arkansas being ranked 14th in 2004 (SCREW YOU HOUSTON NUTT).
More to the point, Evaluative Rankings (kind of necessarily) play more to the point of lower Ranking Violations. Predictive Rankings don't do that so well. So saying "YAY I BEAT BILL CONNELLY" doesn't really mean anything. So did Richard Billingsley, and he's the Devil Incarnate.**
A better comparison would be how well the Adj Marg rankings do at predicting games before the fact. That takes more time and/or programming skill than I am prepared to do as of now, though I do plan on looking at how my numbers perform next season. However, I do have one tool for use.
*NOTE: So, I actually double-count every game between FBS teams. Sorry, it's just how it is. I actually don't think it matters that much at all, since the vast majority of games are between FBS teams. Also, if anything (since I'm likely to predict a win against FCS teams), this lowers my retro-dictive prediction percentage, if at all. Again, I don't think it really matters. Just a note.
**Maybe not. Maybe.
2011-12 Bowl Predictions
I do have the bowl match-ups from last year, and I can easily do stuff to see what my Adj Marg rankings would have predicted before bowl season for each game.
Okay, done! There were 35 bowl games in 2011-12, and the Adj Marg Rankings would have predicted the correct winner in 26 out of 35 match-ups. Hmm... not that great on the surface (though, I'll note that would be the 96th percentile in ESPN).
However, taking a Confidence Points route (those of you that compete in ESPN Bowl Mania know what that is) based on predicted score differential using the Adj Off, Def ratings (proportional to the Adj Marg differential), would have given a 491 pts out of 630 possible. That would've given me 1st place in my pool (and 95th percentile in ESPN).
I guess I think that's pretty decent overall, especially considering it was all made using 0's and 1's at the core. Not that there weren't errors. West Virginia and Clemson (though I predicted the correct winner) was pretty bad, based on predicted score differential. What do we say kids? That's right: FUCK CLEMSON. Here's a chart of my predictions vs. the actual scores (that point way at the top would be the Orange Bowl):
A definite trend, but kind of rough. Still, I think it's decent, all considering. And it's not all bad! My numbers predicted the Sun Bowl between Georgia Tech and Utah to be a final score of 22.8-24.8 (in favor of Utah). Utah won 27-24 in overtime (tied at 24 going into overtime). I'm going to pat myself on the back for that one.
NOTE: Just as with the "predictor" percentages shown above, all the bowl games are double counted. That's why I'm not reporting a correlation for that plot, there's not enough data.
Expected Scoring Differential vs. Actual Scoring Differential (retro-dictive)
As I noted, an ideal evaluation tool for my rankings would be predictions prior to each week's games, so this isn't perfect. Still, I think it's worth a look. Obviously, I have the data for each game played during the season, and just as I can compare the expected winner and loser of each match-up, I can compare the expected scoring differential to the actual scoring differential (similar to the plot above, but again: these are retro-dictive projections, not predictive). Here's the plot for 2011 (all games, FBS-FBS games double-counted):
That's pretty good, right? Obviously, there's some outliers (That data point just above the "40" and just to the left of the y-axis is BYU's 10-54 defeat by Utah, that data point at (-17, 26) is West Virginia's inexplicable 23-49 loss to Syracuse. Screw the Big East), but the expectations line up fairly well with the reality (the correlation is 0.786).
2012 Season Projections
The formula I've used for my 2012 projections (Adj Off, Def) has been:
67% 2011 Adj Off, Def
33% 2007-10 Adj Off, Def
You may wonder why I put so much emphasis on last year's data. Wouldn't it be better to switch those percentages? What about a weighted average over the last five years (such that more recent data is more heavily weighted in the projections)? And why use 2007 data for 2012 projections? Will I ever find love? All good questions, most of which I've asked myself. And the answer is: that's just what works best, sorry. Also: yes, because you're a wonderful person.
Since I have games scores (and thus rankings) going back several years, I've tested the correlations for 2011 projections against the 2011 actual results. That's the only year I've run, but I think it works well enough as a test case. I've tried weighted averages (such that more recent results are more important), I've tried more or less years back (6 years in the past vs. 5, 4, or 3 years in the past), I've tried giving different weights to the most recent year (since 67% seemed too high to me). Of all of those possible permutations, the best correlations (~0.75) come with using the above formula. I can't really justify changing it then if everything else performs worse, even though I want to.
And I actually run into this problem a lot. I'll have an idea on how to improve the calculations, so I'll save a test sheet and see how an updated formula performs. Sometimes it does improve! Most of the time, even ideas that I think are good and should improve my numbers don't pan out. (Though I think a FUCK CLEMSON adjustment should help, I'll look into that one. Yes, I had to mention it twice.) So it's not just an evaluation process of "eh, looks good.*" I was more than a little disappointed that eliminating overtime scoring didn't have a bigger effect on the correlations than it did. C'est le vie.
*it's mostly a process of "eh, looks good"**
**this is also my process after
not combing my hair in the morning
So... yeah. Just wanted to share that with y'all in case any of you were wondering why I run the numbers how I do. Again, if any of you have any comments, criticism, or suggestions, please let me know! Now let's get back to hating [EVIL RIVAL SCHOOL]. I suggest Notre Dame for commentator unity.