Tuesday, December 2, 2014

More on "Why Our Memory Fails Us"

Today the New York Times published an op-ed by Daniel Simons and myself, under the title "Why Our Memory Fails Us." In the article, we use the recent discovery that Neil deGrasse Tyson was making incorrect statements about George W. Bush based on false memories as a way to introduce some ideas from the science of human memory, and to argue that we all need to rethink how we respond to allegations or demonstrations of false memories. "We are all fabulists, and we must all get used to it" is how we concluded.

In brief, Tyson told several audiences that President Bush said the words "Our God is the God who named the stars" in his post-9/11 speech in order to divide Americans from Muslims. Sean Davis, a writer for the website The Federalist, pointed out that Bush never said these exact words, and that the closest words he actually said were spoken after the space shuttle explosion in 2003 as part of a tribute to the astronauts who died. Davis drew a different conclusion than we did—namely that the misquotes show Tyson to be a serial fabricator—but he brought Tyson's errors to light in a series of posts at The Federalist, and he deserves credit for noticing the errors and inducing Tyson to address them.

Tyson first responded, in a Facebook note, by claiming that he really did hear Bush say those words in the 9/11 context, but he eventually admitted that this memory had to be incorrect.

All this happened in September. After reading Tyson's response, I wondered why it didn't include a simple apology to President Bush for implying that he was inciting religious division. On a whim I tweeted that Tyson should just apologize and put the matter behind him:

I had never met or communicated with Neil deGrasse Tyson, and I doubt he had any idea who I was, so it was somewhat to my surprise that he replied almost immediately:

A few days later, Tyson issued his apology as part of another Facebook note entitled "A Partial Anatomy of My Public Talks." Hopefully it is clear that we wrote our piece not to pick apart Tyson's errors or pile on him, but to present the affair as an example of how we can all make embarrassing mistakes based on distorted memories, and therefore why our first reaction to a case of false memory should be charitable rather than cynical. Not all mistaken claims about our past are innocent false memories, of course, but innocent mistakes of memory should be understood as the norm rather than the exception.

The final version of the op-ed that we submitted to the New York Times was over 1900 words long; after editing, the published version is about 1700 words. Several pieces of information, including the names of Davis and The Federalist—who did a service by bringing the matter to light—were casualties of the condensation process. (A credit to ourselves for the research finding that most people believe memory works like a video camera was also omitted.) We tried to leave it clear that we deserve no credit for discovering Tyson's misquote. In our version there were also many links that were omitted from the final online version. In particular, we had included links to Davis's original Federalist article, Tyson's first reply, and Tyson's apology note, as well as several of the research articles we mentioned.

For the record, below is a list of all the links we wanted to include. Obviously there are others we could have added, but these cover what we thought were the most important points relevant to our argument about how memory works. For reasons of their own, newspapers like the Times typically allow few links to be included in online stories, and prefer links to their own content. Even our twelve turned out to be too many.

Neil deGrasse Tyson's 2008 misquotation of George W. Bush (video)

Bush's actual speech to Congress after 9/11 (transcript)

Bush's 2003 speech after the space shuttle explosion (transcript)

Sean Davis's article at The Federalist

Tyson's initial response on Facebook

Tyson's subsequent apology on Facebook

National Academy of Sciences 2014 report on eyewitness testimony

Information on false convictions based on eyewitness misidentifications from The Innocence Project (an organization to which everyone should consider donating)

Roediger and DeSoto article on confidence and accuracy in memory

Simons and Chabris article on what people believe about how memory works

Registered replication report on the verbal overshadowing effect

Daniel Greenberg's article on George W. Bush's false memory of 9/11

Monday, November 10, 2014

GAME ON — My New Column in the Wall Street Journal

For a while I’ve had the secret ambition to write a regular newspaper column. At one time I thought I could write a chess column; at other times I thought that Dan Simons and I could write a series of essays on social science and critical thinking. Last year I suggested to the Wall Street Journal a column on games. They turned me down then, but a few weeks ago I gently raised the idea again and the editors kindly said they would give it a try. So I’m excited to say that the first one is out in this past weekend’s paper (page C4, in the Review section), and also online here.

The column is about Dan Harrington's famous "squeeze play" during the final table of the 2004 World Series of Poker main event. Here's how ESPN covered the hand (you can see in the preview frame that he was making a big bluff with his six-deuce):

There were several things about this hand that I would have mentioned if I had the space. First, a couple of important details for understanding the action:

  • Greg Raymer, the ultimate winner, started the hand with about $7.9 million in chips. Josh Arieh, who finished third, had $3.9 million. Harrington had $2.3 million, the second smallest stack at the table.
  • Seven players remained in the tournament (of the starting field of about 2500) when this hand was played. At a final table like this, the prize payouts escalate substantially with each player eliminated. This might explain why Harrington put in half of his chips, rather than all of them. In case he got raised or called and lost the hand, he would still have a bit left to play with, and could hope to move up the payout chart if other players busted before he did.
  • David Williams, the eventual runner-up, was actually dealt the best hand of anyone: he had ace-queen in the big blind. But facing a raise, a call, and a re-raise in front of him, he chose to fold, quite reasonably assuming that at least one of the players already in the hand would have had him beat, and perhaps badly—e.g., holding ace-king. For reasons of space and simplicity I had to omit Williams from the account in the article. I also omitted the suits of the cards.
  • Dan Harrington is a fascinating character. He excels at chess, backgammon, and finance as well as poker, and he wrote a very popular series of books on hold'em poker with Bill Robertie (himself a chess master and two-time world backgammon champion). He won the World Series of Poker main event in 1995. After his successful squeeze play in 2004 he wound up finishing fourth. He had finished third the year before.
Some people have noted that this could not have been the very first squeeze play bluff ever in poker. And of course it wasn't. But it was, in my opinion, the most influential squeeze play. Because ESPN revealed the players' hole cards, it was verifiably a squeeze play. As I hinted in the article, without the hole card cameras, everyone watching the hand would have assumed that Harrington had a big hand when he raised. Even if Harrington had said later that he had a six-deuce, some people wouldn't have believed him, and no one could have been sure. Once ESPN showed this hand (and Harrington wrote about it in his second Harrington on Hold'em volume), every serious player became aware specifically of the squeeze play strategy, and generally of the value of re-raising "light" before the flop. And because the solid, thinking man's player Dan Harrington did it, they knew it wasn't just the move of a wild man like Stu Ungar, but a key part of a correct, balanced strategy.

Of course, the squeeze play doesn't work every time. It would have failed here if Arieh (or Raymer) really did have big hands themselves. Harrington probably had a "read" that suggested they weren't that strong, but I think this read would have been based much more on his feel for the overall flow of the game—noticing how many pots they were playing, whether they had shown down weak hands before—than on any kind of physical or verbal tell.

Two years after Harrington's squeeze play, Vanessa Selbst was a bit embarrassed on ESPN when she tried to re-squeeze a squeezer she figured she had caught in the act. At a $2000 WSOP preliminary event final table, she open-raised with five-deuce, and drew a call followed by a raise: the exact pattern of the squeeze play. After some thought she went all-in, but the putative squeezer held pocket aces. Selbst was out in 7th place. But she didn't stop playing aggressively, and since then she has become one of the top all-time money winners, and most respected players in poker. Most of the hand is shown in the video below, starting at about the 6:50 mark.

Some readers of the column asked whether I wasn't just describing a plain old bluff, the defining play of poker (at least in the popular mind). The answer is that the squeeze play is a particular kind of bluff—indeed, a kind of "pure bluff," which is a bluff in which your own hand had zero or close to zero chance of actually wining on its merits. (A "semi-bluff," by contrast, is a bluff when you figure to have the worst hand at the time of the bluff, but your hand has a good chance of improving to be the best hand by the time all the cards are out.) What the Harrington hand showed is a particular situation in which a pure bluff is especially likely to work. Pros don't bluff randomly, or when they feel like it, or even when they think they have picked up a physical tell. And they especially don't bluff casually when more than one opponent has already entered the hand. Harrington's bluff was more than just a bluff: It was a demonstration of how elite players exploit their skills to pick just the right spots to bluff and get away with it.

In future columns I’ll talk about different games, hopefully with something interesting to say about each one. The next column should appear in the December 6–7 issue, and will probably concern the world chess championship. My “tryout” could end at any time, of course, but for now my column should be in that same space once per month, at a length of about 450 words. As most of you know, it’s a challenge to say something meaningful in so few words, and for me it’s a challenge just to stay within that word limit while saying anything at all. As in poker, I may need a bit of luck.

By the way, I think it's too bad that the New York Times decided to end their chess column last month. I believe, or at least hope, that there is a market for regular information on games like chess for people who don't pay so much attention via other websites and publications. I remember reading Robert Byrne's version of the Times column in the 1970s. I would get especially excited when my father came home on one of the days the column ran, so that I could grab his newspaper and check it out. Yes, it used to run at least three times per week, then two, then just on Sundays (when Dylan McClain took over with a different, and I think better, approach from Byrne's). Now it doesn't run at all. The Washington Post ended its column as well, but some major newspapers still have one (The Boston Globe and New York Post come to mind).

PS: If you liked the squeeze play column, here are some of my other pieces on games that you can read online, in reverse chronological order:

"The Science of Winning Poker" (WSJ, July 2013)

"Should Poker Be (A Tiny Bit) More Like Chess?" (this blog, August 2013)

"Chess Championship Results Show Powerful Role of Computers" (WSJ, November 2013)

"Bobby Fischer Recalled" (WSJ, March 2009)

"It's Your Move" (WSJ, October 2007)

"How Chess Became the King of Games" (WSJ, November 2006)

"The Other American Game" (WSJ, July 2005)

"A Match for All Seasons" (WSJ, December 2002)

"Checkmate for a Champion" (WSJ, November 2000)

Friday, March 28, 2014

"Data Journalism" on College ROI at FiveThirtyEight: Where's the Critical Thinking?

NOTE: See the end of this entry for important updates, including one from 11/9/15.

A website called PayScale recently published a "College ROI Report" that purports to calculate the return on investment (ROI) of earning a Bachelor's degree from each of about 900 American colleges and universities. I found out about this report from an article on Nate Silver's new FiveThirtyEight website. The article appears under a banner called "DataLab," implying that it is an example of the new "data journalism" that Silver and his site are all about. Unfortunately, the article contains approximately zero critical thinking about the meaning of the PayScale report, its data sources, and its conclusions.

PayScale did a lot of number-crunching (read all about it here), but the computation resulted in two key numbers for each institution: (1) the cost of getting an undergraduate degree, taking into account factors like financial aid and time to graduation; and (2) the expected total earnings of a graduate over the next twenty years. The first one can be figured out from public data sources. The second one came from a survey by PayScale (more on this later). The ROI for a college was calculated by subtracting #1 from #2, and then further subtracting the expected total earnings of a person who skipped college and worked for 24–26 years instead (which happens to be about $1.1 million). The table produced by PayScale thus purports to show how much you would get back—in monetary income—on the "investment" of obtaining a degree from any particular college or university.

Indeed, PayScale says that "This measure is useful for high school seniors evaluating their likely financial return from attending and graduating college." But this is simply not true. As I read the FiveThirtyEight article on the PayScale report, I was waiting for them to point out the reasons why, but they never did. The only critical comments were about incorporating the effects of student debt.

What are the problems with the PayScale analysis? First of all, it only makes sense to speak of the comparative return on an investment when the investors have a choice of what to invest in. If every person could choose to attend any college (and to graduate from it and get a full-time job), or to skip college entirely, then it would be meaningful to ask which choice maximizes return. This is what we do when calculating a financial ROI: we try to figure out whether investing in stocks versus bonds, or one mutual fund versus another, or one business opportunity versus another, will be more profitable. But colleges have admissions requirements, so not everyone can go to whatever college he or she wants. Colleges select their students as much as students select their colleges. And in fact, the people who attend different colleges can be very different, and they can be even more different from the people who don't attend college at all.

This means that the Return in this "ROI" depends on much more than the Investment. It also depends on who is doing the investing. In fact, it is far from trivial to figure out the true ROI of going to Harvard versus Vanderbilt versus Wayland Baptist versus Nicholls State versus not attending college at all. To figure this out, you would have to control in the analysis for all the characteristics that make students at different colleges different from one another, and different from students who don't go to college. Factors like cognitive ability, ambition, work habits, parental income and education, where the students grew up and went to high school, what grades they got, and many others are likely to be important. In fact, those other factors could be so important that they might wind up explaining more of the variation in income between people than is explained by going to college—let alone which particular college people go to.

Even controlling for data we might be able to obtain, like the average SAT score and parental income of students who attend each college, would not completely solve the problem, because there could be factors that we can't measure that have an important effect. Only by randomly assigning students to different colleges (or to directly entering the workforce after high school) would we get an estimate of the true ROI (measured in money—which of course leaves aside all the other benefits one might get from college that don't show up in your next twenty years of paychecks).

Of course this ideal experiment won't ever happen, but clever researchers have tried to approximate it by doing things like looking at students who were accepted to both a higher-ranked and a lower-ranked school, and then comparing those who enroll in the higher-ranked one to those who enroll in the lower-ranked one. Since all the students in this analysis got into both schools, the problem of different schools having different students is mitigated. (Not erased entirely, though: for example, people who deliberately attend lower-ranked schools might be doing so because of financial circumstances, or their college experience may differ because they are likely to start out above average in ability and preparation for the school they attend, as compared to those who choose higher-ranked schools.)

FiveThirtyEight said nothing about this fundamental logical problem with the entire PayScale exercise. Nor did it address the other flaws in the analysis and presentation of the data.

It could have also asked about the confidence intervals around the ROI estimates provided by PayScale. When you give only point estimates (exact values that represent just the mean or median of a distribution), and proceed to rank them, you create the appearance of a world where every distinction matters—that the school ranked #1 really has a higher ROI than #2, which is higher than #3, and so on. PayScale's methdology page says, "the 90% confidence interval on the 20 year median pay is ±5%" (but 10% for "elite schools" and "small liberal arts schools or schools where a majority of undergraduates complete a graduate degree"). The narrowness of these intervals is a bit hard to believe, as well as their uniformity (how does every school in a category get the same confidence interval?). Why not just put the school-specific confidence intervals into the report, so that it is obvious that, for example, school #48 (Yale) is probably not significantly higher in ROI than, say, school #69 (Lehigh), but is probably lower in ROI than school #6 (Georgia Tech)?

It's hard to have much confidence in these confidence intervals anyhow, since we don't know how many people PayScale surveys at each college to make the income calculations (which will be the critical drivers of the variability in ROI). Many of the colleges are small; how reliable can the estimates of what their graduates will earn be? And are the surveys of college graduates unbiased with respect to what field the graduates work in? Or, for example, do engineers and teachers tend to respond to these surveys more than, say, baristas and consultants? The unemployed and under-employed are not included; this will have the effect of inflating the apparent ROI of schools whose graduates tend, for whatever reasons, not to have full-time jobs. Payscale says that non-cash compensation and investment income are not included, which might bias down the reported ROI of graduates of elite schools who go into financial careers.

Finally, perhaps FiveThirtyEight could have looked at whether the schools that stand out at either end of the distribution happen to be smaller than the ones in the middle. Ohio State, Florida State, et al. have so many students, drawn from such a broad distribution of ability and other personal traits, that they should be expected to have "ROI" values nearer to the middle of the overall distribution of universities than should small colleges, which through pure chance (having, by luck, more high- or low-income graduates) are more likely to land in the top or bottom thirds of the list. Some degree of mean reversion may be expected, so the rankings of PayScale will lose some predictive value for future ROIs, especially in the case of small schools.

The comments I have made all concern the underlying PayScale report, but I think it is FiveThirtyEight that has not upheld the best standards of "data journalism." If that term is to have any meaning, it can't simply refer to "journalism" that consists of the passing along of other people's flawed "data" (especially when those people are producing and promoting the data for commercial purposes). Nate Silver earned his reputation, and that of his FiveThirtyEight brand, largely by calling out—and improving on—just this kind of simplistic and misleading analysis. It's sad to see his "data journalism organization" no longer criticizing superficiality, but instead promoting it.

Postscripts: 3/29/14: After I first posted this piece, I realized three things. First, I hadn't mentioned mean reversion originally, so I added it in. But it's a minor issue compared to the others. Second, I didn't make it clear that notwithstanding what I wrote above, I am 100% in favor of more good data journalism. I agree with Nate Silver and others that journalists (and everyone!) should be more aware of the data that exists to answer questions, how to gather data that has not already been compiled, how to think about data, and so on. A great example of silly data-ignorant journalism is the series of articles the New York Post has been running on the "epidemic" of suicides and suspicious deaths in the financial industry. The proper question to start with is whether there is an epidemic, or even a significant excess over normal variation, as opposed to  a set of coincidences that would be expected to happen every so often. Perhaps there is an epidemic, but I am skeptical. The Post (and other outlets that have reported on these deaths) skip right over this crucial threshold issue. Maybe FiveThirtyEight could address it and teach its readers about the danger of jumping to conclusions after seeing nonexistent patterns in noise. Third, and finally, I should have mentioned that FiveThirtyEight has on board some people who really do know how to think seriously about data (and do it much better than I do), such as the economist Emily Oster. I hope Emily's influence will spread throughout the organization. 3/30/14: I removed text in the original version that asked whether outliers like hedge fund managers had their incomes included in PayScale's calculations. They won't have too much influence, regardless, because PayScale is reporting medians, not means. My apologies for the inadvertent error. 4/5/14: I changed the number of colleges included from 1310 to "about 900." There are 1310 entries in Payscale's table, but many colleges are listed more than once if they have different tuition options (e.g. state resident versus non-resident). 4/7/14: I added links to the Krueger & Dale (and Dale & Krueger) economics papers that tried to estimate the returns from attending more selective/elite colleges. I knew about these papers when I wrote the initial post, but had forgotten who the authors were.

Addendum, 11/9/15: In an article at washingtonpost.com, Nate Silver is quoted as saying the following when comparing his FiveThirtyEight site to Vox, one of his main competitors:
I think the best five or ten things they do are terrific, right? They have some great people working for them. I think they also have a lot of less than terrific things … I know how hard my writers and my editors work to try and get get the facts right, to not always go for the hot take that you can’t really provide evidence for, right? To avoid errors and mistakes. And so, you know, I obviously have some skin in the game where I feel like if people are taking a lot of shortcuts and things that have the sheen of being data driven and maybe aren’t very empirical and aren’t very self aware, then, yeah, I guess I get really annoyed.
I think "taking a lot of shortcuts and things that have the sheen of being data driven and aren't very empirical and aren't very self aware" is an excellent description of the FiveThirtyEight piece on PayScale's completely misleading ROI analysis. And the piece remains on the site, as far as I can tell just as it was when I wrote this entry, with no corrections or updates or qualifications of its superficial and non-self-aware reporting. But at least it wasn't published on Vox!