Thursday, December 26, 2013

Bring Back the BCS Computers! (TFS, part 10)

I am reading Thinking, Fast and Slow, by Daniel Kahneman. In this series I will summarize key parts of the book and supply some comments and reflections on the material.

Part III: Overconfidence
Chapters 21-24

Summary:

Intuition often leads us astray. Simple formulas and simple statistics often out-predict experts in noisy environments. People are often hostile to algorithms.

Example. Trained counselors were asked to predict the grades of college freshman at the end of the school year. The counselors got to interview the students for 45 minutes and had access to high school grades, several aptitude test scores, and a four-page personal statement from each student. A simple regression based on only high school grades and one aptitude test did a better job of predicting the actual outcomes.

Planning Fallacy: Plans and forecasts are unrealistically close to the best-case scenario and could be improved by consulting the statistics of similar cases. (There is overoptimism on the part of planners and decision makers.)

How to mitigate the planning fallacy: Identify an appropriate reference class and obtain the statistics of the reference class. Use the statistics (not your intuition) as the baseline prediction.

End of Part III.

My Thoughts
:

Chapter 21 is, hands-down, the very best chapter in the book so far. In it, Kahneman explains the importance of using checklists and rubrics. He even has an interesting explanation of why checklists are so successful: they are like simple formulas; regressions without the weighting. So I need to take back all my griping about Kahneman not emphasizing checklists and rubrics from my previous posts! He does all the talking here!

On the BCS:

Every year college football decides a national champion. Next year, instead of the champion being determined by the outcome of the game between the #1 and #2 ranked schools (as determined by a combination of computer algorithms and coach's polls -- the BCS), the champion will be determined by a four team, single-elimination tournament. The four teams that play will be ENTIRELY determined by an important people/coach's poll/committee.

The goal of the change is to make the selection of the national champion less controversial. Instead, I think this move will make it more likely that the national champion will be controversial. Instead of choosing two teams, four have to be chosen, and now there is no defined criteria other than what the coaches feel like. I think the move to relying on the committee makes the decision process LESS structured and consistent from year to year.

One of the biggest perceived problems with the BCS people had was that a "computer" was choosing which one-loss team plays in the national championship. (In many years there is only one undefeated team, so the BCS had to select which other team to play against the undefeated team.) The move to a 4 team playoff chosen by coaches doesn't really solve the problem of which one-loss team(s) to include, because there are almost always more than four undefeated or one-loss teams. Instead, the decision will be less systematic and more controversial.

To me, the biggest problem with the BCS isn't that "computers" determine who plays but that the computer ranking algorithms were somewhat secret. Computers are great calculators. Things like strength of schedule and how to weight wins early in the season versus late in the season are debatable. Much better than letting coach's decide based on "feel" would be to publicly release the calculations done in the computer section of the BCS and be transparent about what they are trying to achieve. How is strength of schedule calculated? How much does an early loss matter? How much does a late loss matter? If there is a problem with the BCS computer formula, you should be able to identify WHY that problem exists, and adjust the formula from year-to-year.

At the very least, the algorithms should still be calculated and released even if only to provide information to the coaches. But based on the results Kahneman presents, there is no guarantee coaches will take these "computers" seriously, and will go out of there way to find "broken legs" the computer missed to overrule the ranking -- leading to poorer outcomes.

If I had to pick the ideal system for determining the champion (and who plays in the big bowl games), I would go with a Swiss-style tournament. The schedule throughout the season would be dynamic depending on the wins and losses of the previous week; the last round of the Swiss would be the championship game between the top two teams and the bowls. The season as a whole matters. A pure end-of-year tournament with lots of entries (like the NCAA basketball tournament) would make the end of the year matter almost exclusively. Teams only have to play well enough to make it into the tournament. There would be more games where the outcome doesn't matter. Even the worst team throughout the season of all the teams in the tournament could still become the national champion by playing the best in the tournament. To me, "national champion" should take into account the accomplishments of the whole year. Many people like the decisiveness of a tournament and the excitement of upsets, but just because you have a winner from a single-elimination tournament doesn't mean you've picked the best team of the year -- the national champion.



Food for Thought:

0) For more on checklists, see Atul Gawande's A Checklist Manifesto.

1) When are you an expert?

2) When he first became president, George W. Bush attempted to be friendly with Putin based on the following evaluation: "I looked into his eyes and liked what I saw." What are the pros and cons of world leaders making policy decisions based on first impressions of other world leaders?

3) Kahneman gives the following procedure on how to prepare for and run an interview:
    -- Select a few traits that are prerequisites for success. Make sure you can assess them reliably by asking a few FACTUAL questions.
    -- Make a list of factual questions. Prepare ahead of time to score the answers on a scale of 1-5 by assessing what you think is a weak answer versus a strong one.
    -- Collect and score the information one trait at a time to avoid halo effects.
    -- Add up the scores for each trait.
    -- Firmly resolve to hire the person with the highest score.

What are the pros and cons of this approach?

4) If people were NOT subject to the planning fallacy, would there be fewer wars? Fewer small businesses? Fewer government initiatives? Would this be good or bad?

How does a competitive market mitigate the planning fallacy?

5) When was the last time something turned out much worse than you expected? Was the outcome much worse than you expected AND much worse than the typical outcome of similar cases? Could you have gotten more information at the start to better inform your decision if you had known what to look for?

1 comment:

  1. Possibly the BCS switching from a computer algorithm to a sort of voting mechanism would reduce complaints the same way democratic elections do. Because we get to vote, supposedly we are the ones determining the outcome and thus are not so entitled to complain about it. Of course this feeling of control at the individual level is really an illusion.

    I found your #4 to be a very interesting and important question.

    As you may know, I strongly and predictably suffer from the planning fallacy when I am estimating how long it will take to cook things. This results in meals that are frequently late, but something stops me from adjusting even though I am aware of the problem. Here is my current theory: I think explicitly, "If everything goes according to plan, I will be done at the target time of 6:00. Realistically, things tend not to go according to plan, though, so I *could* give myself a safety margin and aim for 5:30. But then in the best case scenario I will actually be done half an hour early and the food will be cold by 6:00. Wait, that's not good at all! I don't want to be aiming for a "best case scenario" which actually sucks, even though I probably won't hit it."

    In practice I outsource correction for this persistent bias to Catherine whenever possible.

    ReplyDelete