Sunday, September 13, 2015

No, College Lectures Are Not "Unfair"

In her recent New York Times essay "Are College Lectures Unfair?" Annie Murphy Paul, a science writer, asks "Does the college lecture discriminate? Is it biased against undergraduates who are not white, male, and affluent?" She spends the rest of the essay arguing the affirmative, claiming that "a growing body of evidence suggests that the lecture is not generic or neutral, but a specific cultural form that favors some people while discriminating against others, including women, minorities, and low-income and first-generation college students." She cites various studies that find that "active learning" and "flipped classroom" pedagogical techniques lead to the biggest improvements in performance among the groups who do the worst in traditional lecture-format classes.

Unfortunately, while it may well be true that flipping the classroom (having students watch video lectures at home and using classroom time for problem-solving and discussion) and making learning more active (increasing low-stakes quizzes, which enhance memory for concepts) are excellent ideas—and I personally think they are—the argument that the use of lectures has anything to do with fairness and discrimination is simply erroneous.

Here's why. First, if one group of students tends to perform better than another under a particular instructional regime, then it is likely that any change that improves everyone's performance (as active learning does, according to studies cited by Ms. Paul) should benefit most the group that starts out doing worst. This is a simple function of the fact that grades have maximum values, so there is less room for improvement for students who are already doing well. If this point isn't clear, imagine a hypothetical educational intervention that leads to every student knowing the answer to five particular questions on a 100-question final exam. The best-prepared students would probably have known several of those answers absent the intervention, while the least-prepared students would have good chances of not not having known them. Therefore, the least-prepared students might gain as much as 4 or 5% on their final exam grades, but the best-prepared students might gain as little as 1 or 0%. This would narrow the achievement gap between those groups. Any intervention that results in more learning is likely to have a similar pattern of effects.

So active learning should be good for any students who start out doing worse in general, not just for minority, low-income, or first-generation students. As Ms. Paul notes, "poor and minority students are disproportionately likely to have attended low-performing schools" and get to college "with less background knowledge." Suppose you tried active learning in a school where the students were overwhelmingly the children of white, middle-class, college-educated parents. You would still expect the lowest-performers among those students to benefit the most from the improved methods. Likewise, the best-performing minority students should get less out of improved pedagogy than the worst-performing minority students. In other words, the value of active learning for traditionally underperforming groups of students has everything to do with the fact that those students have underperformed, and nothing to do with the fact that they come from minority groups.

Now to the question of whether older pedagogical approaches "discriminate" against those minority groups, as Ms. Paul says they do. According to the American Heritage Dictionary of the English Language, to discriminate is "to make distinctions on the basis of class or category without regard to individual merit, especially to show prejudice on the basis of ethnicity, gender, or a similar social factor." Teaching a course in an old-fashioned lecture format, though it may often be less effective than teaching in flipped or active formats, and for that reason may result in lower grades for some types of students than for others, makes no distinctions between classes or categories of people, and therefore cannot be a form of discrimination. Indeed, any inferior form of instruction should lead to wider differences between students who start at different levels.

Consider the limiting case: imagine a "form of instruction" that is no instruction at all: the professor never issues a syllabus or even shows up, but still gives the final exam at the end of the semester. Students who started out with the most "background knowledge" about the topic will still know the most, and students who started out knowing the least will still know the least. All pre-existing differences between ethnic groups and any other student groups will remain unchanged. On the other end of the instructional spectrum, suppose the professor's teaching is so perfect that every student learns every bit of the material: then there will be no differences between any groups, because all students will receive grades of 100%. Note that none of this concerns what groups the students are part of—it can all be explained entirely by the artifact of high-quality instruction benefiting poorer-performing students more than better-performing students. Therefore, Ms. Paul's essay uses the words "biased" and "discriminating" incorrectly, with the pernicious effect of accusing anyone who doesn't flip their classroom or give lots of quizzes of being prejudiced against minority students.

For what it's worth, I have been truly impressed by the growing body of research on the science of learning. I think it's one of the most exciting practical achievements of cognitive psychology, and I am trying to incorporate more of it into my own teaching. But I also believe that good, engaging lectures have their place, and may be more effective in some disciplines than others. To be clear, I have no objections to any of the research Ms. Paul cites in her essay. (Indeed, I have not even read most of that research, because what's at issue here is not the scientific results, but the meaning Ms. Paul ascribes to them. I've assumed that she described all the results accurately in her essay. If the researchers somehow managed to separate the effects of minority status and baseline knowledge or performance in their analysis of the active learning effects, Ms. Paul doesn't say anything about how they did it—and since that analysis would be the logical lynchpin for her claims of bias, discrimination, and unfairness, it is negligent to ignore it.) We have learned a lot about effective teaching methods, but none of it justifies the sloppy, inflammatory claim that lectures are "biased" and "discriminate" against students from minority, low-income, or nontraditional backgrounds.

Saturday, January 10, 2015

Martin Thoresen's World Chess Championship

My third “Game On” column, "The Real Kings of Chess Are Computers," appears this weekend in The Wall Street Journal. I write about the "real world chess championship," which is known formally as the Thoresen Chess Engines Competition, or TCEC. This is a semi-annual tournament that pits almost all the top computer chess programs against one another. Since the best chess engines are now much stronger than even the best human players, a battle between the top two engines is a de facto world championship of chess-playing entities.

That battle was the Superfinal match of TCEC season 7, and it was won last month by Komodo over Stockfish (both playing the same 16 core computer). In a digital-only extra, "Anatomy of a Computer Chess Game," I try to explain a key moment in game 14 of the match, which gave Komodo a lead it never relinquished over the remaining 50 games.

As part of the research for these pieces, I interviewed TCEC impresario and eponym Martin Thoresen by email. Below is an edited transcript of our conversation, which took place between 29 December 2014 and 2 January 2015. The questions have been re-ordered to make the flow more logical.

CHRISTOPHER CHABRIS: Let’s start with the recent Season 7 Superfinal match. What is your opinion about the result? Do you think it shows that Komodo is a “better chess player” than Stockfish, in their current versions?
MARTIN THORESEN: I think the Superfinal was very close and exciting. The draw rate was slightly higher than what I expected, but then again the engines are very close in strength so this is quite natural. I think the result shows that Komodo is the better engine on the kind of hardware that TCEC uses. And for grandmasters with powerful computers this should be something to take note of when they analyze games using chess engines.

Do you believe that TCEC features the “best chess players” in the world?
Yes, I would say any of the top programs of say, Stage 3 and onwards would pretty much crush any human player on the planet using TCEC hardware.

Do you think it is a problem to have so many draws (53 out of 64 games)? It definitely distinguishes engine-engine matches from human-human matches to have so many draws, but I agree with you that it must result partly from the players being stronger than the best humans.
Personally I don’t mind the draw rate being this high in the Superfinal, it makes it very tense. But one of the main goals of TCEC is to entertain people. Too many draws defer from that and too many one-sided openings would lower the quality overall, even if it lowers the draw rate. I would be satisfied with a draw rate of roughly 75% in the Superfinal.

You must have watched more engine-engine games than almost anyone else. Were there any games or particular moves or positions that you thought were especially beautiful or revealing in this most recent Superfinal match?
I have not looked deeply at all the games yet, but games like #9 strike me as fascinating.

Let’s talk about some of the details of how TCEC works. Are the games played entirely on your personal computer at your home?
Yes, it’s a 16-core server I’ve built myself. It has two 8-core Intel Xeon processors and 64 GB RAM. It’s located at home here in Huddinge, a suburb of Stockholm, Sweden. I live in an apartment of about 45 square meters.

Why do the games run only one at a time? Because it all happens on one computer? Have you considered using multiple computers so that more games can happen at one time?
Yes exactly, they run only one at a time because the engines utilize all 16 cores to get maximum power, which makes it impossible to run more games. Using more computers is of course something I wish I could do, but then people need to donate more. ☺ The server cost me roughly €4000–€5000 to build. Of course it would be possible to limit each engine to say, four cores, then I could have four games running simultaneously, but then again the engines would be weaker due to the fewer cores. I want TCEC to show only the highest quality of games. Not to mention that I’d have to redesign the website to support many games at once.

How hard was it to write the code that “plays” the two engines against each other, passing moves back and forth, and so on? Do the engines provide you with an API, or do the engine authors give you a special version that corresponds to an API for your own server code? (I assume you wrote the server code yourself too, correct?)
The interface that plays the games is a small command line tool called cutechess-cli, but somewhat modified for TCEC by Jeremy Bernstein after my instructions. I have not coded this tool. Cutechess is simply a UCI/Xboard interface tool that “runs” the engines in accordance with the UCI or Xboard specifications. Basically all chess engines comply with the UCI or Xboard protocols for I/O requests (time control, time left, the move it makes, etc.). Using this tool does not give you a chessboard to view the action like a GUI (Fritz, Arena, SCID, etc.) so ironically I can’t actually watch the game on the server—all I see is a bunch of text.

Who developed the software to broadcast the games to the internet? As someone who followed the latest Superfinal and browsed the archives quite a bit, I can say that it has a very nice interface.
There are two parts of TCEC. One is the website which shows the games, the other is the server on which the games are played. These two are not run on the same machine (for obvious performance reasons), so the server uploads the PGN to the website each minute. The website is designed by me and it has had different designs in previous seasons. The core technology on which it is built is the free JavaScript chess viewer called pgn4web.

How much money would you estimate you have personally spent, and how much total has been spent, to run the TCEC since it started, and season 7 specifically?
I have spent a lot of money. I am not quite sure how much, but I would estimate €6000–€7000 since TCEC started (hardware upgrades, power bills, etc.).

How many hours do you spend on it out of your own life?
For Season 7 I didn’t really code anything new for the website compared to Season 6, so I didn’t spend much time preparing this time around. But when I made the new (current) website for Season 6, I started right after Season 5 finished and coded for almost 3 months straight, sometimes as much as 4–6 hours a day. That left little sleep considering I had (and still have) a full time job as well. But when a season is running, my attention goes mostly to moderate the chat and making sure the hardware runs as it should. So everything from 0–4 hours per day during a season.

Are there any major engines that did not participate over the past few seasons? If so, do you know why they declined?
I pick the engines myself, but there was the case of HIARCS for Season 6, where the programmer Mark Uniacke told me to withdraw it. I only did it because I did not buy his program—he sent it to me for free for Season 5. But if I had bought it myself, I would have included it. Other than HIARCS there have not really been any similar cases in TCEC history. Now and then the question of why Fritz does not participate pops up, but that has a simple answer: It does not come in a form that supports UCI or Xboard—it has a native protocol built into the Fritz GUI which makes it unusable. 

If I understand correctly, your goal is to include every major engine, and the only reasons they could be left out is (a) their authors explicitly withdraw them, or (b) they aren’t compatible with the required protocols. Do I have that right? And that HIARCS and Fritz are the only major engines not participating?
Yes, every major engine that is not a direct clone. The whole clone debate is a hot topic in most computer chess forums. So your (a) and (b) are both correct. HIARCS was not a part of Season 7 for the same reason as it was not a part of Season 6.
Has there been any recent criticism of the TCEC from chess engine developers that were not included (Fritz), or sat out (HIARCS), or others?
No, there has not.
How strong a chess player are you? Do you play in tournaments, a club, or online?
I am not very strong. I don’t even have a rating. I would estimate my strength at around 1500 FIDE on a good day.
Can you tell me a bit about yourself?
I am 33 years old and living with my dog. (For now!) I am currently working as an IT consultant and for the past 1.5 years I’ve worked for Microsoft as part of their international Bing search engine MEPM team. I have no formal education apart from what would equal high school in the U.S. Everything I’ve done so far is self-taught.
How many other people help regularly in organizing and running the TCEC? Are they all volunteers?
Nelson Hernandez is in charge of the openings, assisted by Adam Hair and international master Erik Kislik. Jeremy Bernstein has helped me with the cutechess-cli customization. Paolo Casaschi (author of pgn4web) has also helped me with some specific inquires I’ve had about JavaScript code. They are all volunteers. ☺

How did the idea for the TCEC come to you?
Basically it started after I left the computer chess ranking list (CCRL) after a couple of years of being a member. I was tired of just running computer chess engines games for statistics—I wanted to slow down the time control and watch the games. Obviously, the idea of a live broadcast wasn’t new, and in the beginning it was very simple, just a plain website with moves and not much else. It has now evolved with a more advanced website that I think is kind of intuitive and nice to use and gives TCEC a kind of unique platform.
Why is there so little time between TCEC seasons? Why not one season per year, more like the human world championship? Do the engines change enough between seasons for such frequent seasons to be meaningful?
The rhythm the past few years has been roughly two seasons per year. One season takes 3–4 months, so basically you can watch TCEC for half a year per year. It is definitely debatable whether this is useful or meaningful, but that’s just how it has been. Of course, this might change in the future. I have no other good answer. ☺

What are your plans for the future of TCEC, short-term and long-term?
Short-term would be to take a (well deserved) break. ☺ Long-term would be to be recognized by some big company to “get the ball rolling.”

Are you planning any changes in the format or rules for Season 8?
There might be changes for Season 8. Nothing is decided yet.
Regarding rules, while following the Superfinal games I noticed that some games were declared drawn by the rules when there seemed to be a lot of life left in the position—for example, the final position of game 18, which human grandmasters might play on for either side. Do you think this rule might be revised?
I don’t think the TCEC Draw Rule or TCEC Win Rule will be changed. They have been there from the start (slightly modified since the beginning) and no one is really complaining. As for the particular example with game 18, both engines are 100% certain that this is a draw (both show 0.00) so even if we humans think it looks chaotic, the engines simply have it all calculated way in advance.
I noticed that endgame tablebases were not used in the Superfinal, and this must have resulted in some incorrect evaluations. For example, as I was watching one game, I saw that one engine’s principal variation ended in a KRB-vs-KNN position, which is a general win for the stronger side, but the evaluation was not close to indicating a forced win. Do you think that could have helped cause more draws to happen?
That is correct, tablebases were disabled for all engines for the whole of Season 7. Previously they had been available, but some fans wanted them disabled so I figured they would have their wish fulfilled for Season 7. What tablebases do is to basically help the engines find the correct way into a winning endgame—or in worst case scenario, prevent a loss. It shouldn’t affect the draw rate overall since it would even out in the end. But the point is that without tablebases, the engines can only rely on their own strength in the endgame and the path for getting there.

Have you thought of inviting strong players to comment on the games live, as happens in the top human-versus-human tournaments and matches? Is it too expensive?
We’ve had some discussions, but nothing concrete yet. It could probably be something to do for the Superfinal if the required money could be arranged.

Have you approached any major companies like Intel, AMD, or Microsoft about sponsoring the event or making it much bigger in scope/publicity?
Not in a while. Back when I did, I got no reply or acknowledgment whatsoever.

Do you have data on how many people in total looked at the latest Superfinal on, and any other rough numbers on chat commenters, etc.? 
There were approximately 26,000 unique visitors there during the Superfinal. From memory, the number of users in the chat peaked at roughly 600 at one point during the match.

Do you think that the chess world should pay more attention to TCEC in particular, and to engine-versus-engine games in general? They are rarely quoted in discussions of opening theory, or of the best games, best moves, or most interesting positions. Do you have an opinion about why this is?
I think they should. There are so many beautiful games coming out of TCEC that can blow one’s mind. Why we see little reference to engine-versus-engine games is hard to say, but my guess is that it related to the fact that a chess engine is basically an A.I., so people might have a hard time admitting that “a robot” can play even more beautiful chess than humans.
What intrigues me most about TCEC may be the fact that it is a very personal project for you, yet it has attained a measure of worldwide respect and fame without having a big sponsor or lots of money involved.
This project is of course very personal. Anton Mihailov of contacted me prior to Season 5 and we have continued our cooperation since. To have a hobby being acknowledged like that is of course very nice. With that said, if Intel or AMD or any other big company would be interested in sponsoring TCEC I would definitely be interested in having a talk with them too. Bottom line is: Most people regard TCEC as the official “world computer chess championship.” And I don’t think they are wrong about that! ☺

My thanks to Martin Thoresen, grandmaster Larry Kaufman (of the Komodo team), international master Erik Kislik (who made the final selection of openings for the match), and everyone else who answered my questions for these pieces. I am looking forward to Season 8 of TCEC!

Tuesday, December 2, 2014

More on "Why Our Memory Fails Us"

Today the New York Times published an op-ed by Daniel Simons and myself, under the title "Why Our Memory Fails Us." In the article, we use the recent discovery that Neil deGrasse Tyson was making incorrect statements about George W. Bush based on false memories as a way to introduce some ideas from the science of human memory, and to argue that we all need to rethink how we respond to allegations or demonstrations of false memories. "We are all fabulists, and we must all get used to it" is how we concluded.

In brief, Tyson told several audiences that President Bush said the words "Our God is the God who named the stars" in his post-9/11 speech in order to divide Americans from Muslims. Sean Davis, a writer for the website The Federalist, pointed out that Bush never said these exact words, and that the closest words he actually said were spoken after the space shuttle explosion in 2003 as part of a tribute to the astronauts who died. Davis drew a different conclusion than we did—namely that the misquotes show Tyson to be a serial fabricator—but he brought Tyson's errors to light in a series of posts at The Federalist, and he deserves credit for noticing the errors and inducing Tyson to address them.

Tyson first responded, in a Facebook note, by claiming that he really did hear Bush say those words in the 9/11 context, but he eventually admitted that this memory had to be incorrect.

All this happened in September. After reading Tyson's response, I wondered why it didn't include a simple apology to President Bush for implying that he was inciting religious division. On a whim I tweeted that Tyson should just apologize and put the matter behind him:

I had never met or communicated with Neil deGrasse Tyson, and I doubt he had any idea who I was, so it was somewhat to my surprise that he replied almost immediately:

A few days later, Tyson issued his apology as part of another Facebook note entitled "A Partial Anatomy of My Public Talks." Hopefully it is clear that we wrote our piece not to pick apart Tyson's errors or pile on him, but to present the affair as an example of how we can all make embarrassing mistakes based on distorted memories, and therefore why our first reaction to a case of false memory should be charitable rather than cynical. Not all mistaken claims about our past are innocent false memories, of course, but innocent mistakes of memory should be understood as the norm rather than the exception.

The final version of the op-ed that we submitted to the New York Times was over 1900 words long; after editing, the published version is about 1700 words. Several pieces of information, including the names of Davis and The Federalist—who did a service by bringing the matter to light—were casualties of the condensation process. (A credit to ourselves for the research finding that most people believe memory works like a video camera was also omitted.) We tried to leave it clear that we deserve no credit for discovering Tyson's misquote. In our version there were also many links that were omitted from the final online version. In particular, we had included links to Davis's original Federalist article, Tyson's first reply, and Tyson's apology note, as well as several of the research articles we mentioned.

For the record, below is a list of all the links we wanted to include. Obviously there are others we could have added, but these cover what we thought were the most important points relevant to our argument about how memory works. For reasons of their own, newspapers like the Times typically allow few links to be included in online stories, and prefer links to their own content. Even our twelve turned out to be too many.

Neil deGrasse Tyson's 2008 misquotation of George W. Bush (video)

Bush's actual speech to Congress after 9/11 (transcript)

Bush's 2003 speech after the space shuttle explosion (transcript)

Sean Davis's article at The Federalist

Tyson's initial response on Facebook

Tyson's subsequent apology on Facebook

National Academy of Sciences 2014 report on eyewitness testimony

Information on false convictions based on eyewitness misidentifications from The Innocence Project (an organization to which everyone should consider donating)

Roediger and DeSoto article on confidence and accuracy in memory

Simons and Chabris article on what people believe about how memory works

Registered replication report on the verbal overshadowing effect

Daniel Greenberg's article on George W. Bush's false memory of 9/11

Monday, November 10, 2014

GAME ON — My New Column in the Wall Street Journal

For a while I’ve had the secret ambition to write a regular newspaper column. At one time I thought I could write a chess column; at other times I thought that Dan Simons and I could write a series of essays on social science and critical thinking. Last year I suggested to the Wall Street Journal a column on games. They turned me down then, but a few weeks ago I gently raised the idea again and the editors kindly said they would give it a try. So I’m excited to say that the first one is out in this past weekend’s paper (page C4, in the Review section), and also online here.

The column is about Dan Harrington's famous "squeeze play" during the final table of the 2004 World Series of Poker main event. Here's how ESPN covered the hand (you can see in the preview frame that he was making a big bluff with his six-deuce):

There were several things about this hand that I would have mentioned if I had the space. First, a couple of important details for understanding the action:

  • Greg Raymer, the ultimate winner, started the hand with about $7.9 million in chips. Josh Arieh, who finished third, had $3.9 million. Harrington had $2.3 million, the second smallest stack at the table.
  • Seven players remained in the tournament (of the starting field of about 2500) when this hand was played. At a final table like this, the prize payouts escalate substantially with each player eliminated. This might explain why Harrington put in half of his chips, rather than all of them. In case he got raised or called and lost the hand, he would still have a bit left to play with, and could hope to move up the payout chart if other players busted before he did.
  • David Williams, the eventual runner-up, was actually dealt the best hand of anyone: he had ace-queen in the big blind. But facing a raise, a call, and a re-raise in front of him, he chose to fold, quite reasonably assuming that at least one of the players already in the hand would have had him beat, and perhaps badly—e.g., holding ace-king. For reasons of space and simplicity I had to omit Williams from the account in the article. I also omitted the suits of the cards.
  • Dan Harrington is a fascinating character. He excels at chess, backgammon, and finance as well as poker, and he wrote a very popular series of books on hold'em poker with Bill Robertie (himself a chess master and two-time world backgammon champion). He won the World Series of Poker main event in 1995. After his successful squeeze play in 2004 he wound up finishing fourth. He had finished third the year before.
Some people have noted that this could not have been the very first squeeze play bluff ever in poker. And of course it wasn't. But it was, in my opinion, the most influential squeeze play. Because ESPN revealed the players' hole cards, it was verifiably a squeeze play. As I hinted in the article, without the hole card cameras, everyone watching the hand would have assumed that Harrington had a big hand when he raised. Even if Harrington had said later that he had a six-deuce, some people wouldn't have believed him, and no one could have been sure. Once ESPN showed this hand (and Harrington wrote about it in his second Harrington on Hold'em volume), every serious player became aware specifically of the squeeze play strategy, and generally of the value of re-raising "light" before the flop. And because the solid, thinking man's player Dan Harrington did it, they knew it wasn't just the move of a wild man like Stu Ungar, but a key part of a correct, balanced strategy.

Of course, the squeeze play doesn't work every time. It would have failed here if Arieh (or Raymer) really did have big hands themselves. Harrington probably had a "read" that suggested they weren't that strong, but I think this read would have been based much more on his feel for the overall flow of the game—noticing how many pots they were playing, whether they had shown down weak hands before—than on any kind of physical or verbal tell.

Two years after Harrington's squeeze play, Vanessa Selbst was a bit embarrassed on ESPN when she tried to re-squeeze a squeezer she figured she had caught in the act. At a $2000 WSOP preliminary event final table, she open-raised with five-deuce, and drew a call followed by a raise: the exact pattern of the squeeze play. After some thought she went all-in, but the putative squeezer held pocket aces. Selbst was out in 7th place. But she didn't stop playing aggressively, and since then she has become one of the top all-time money winners, and most respected players in poker. Most of the hand is shown in the video below, starting at about the 6:50 mark.

Some readers of the column asked whether I wasn't just describing a plain old bluff, the defining play of poker (at least in the popular mind). The answer is that the squeeze play is a particular kind of bluff—indeed, a kind of "pure bluff," which is a bluff in which your own hand had zero or close to zero chance of actually wining on its merits. (A "semi-bluff," by contrast, is a bluff when you figure to have the worst hand at the time of the bluff, but your hand has a good chance of improving to be the best hand by the time all the cards are out.) What the Harrington hand showed is a particular situation in which a pure bluff is especially likely to work. Pros don't bluff randomly, or when they feel like it, or even when they think they have picked up a physical tell. And they especially don't bluff casually when more than one opponent has already entered the hand. Harrington's bluff was more than just a bluff: It was a demonstration of how elite players exploit their skills to pick just the right spots to bluff and get away with it.

In future columns I’ll talk about different games, hopefully with something interesting to say about each one. The next column should appear in the December 6–7 issue, and will probably concern the world chess championship. My “tryout” could end at any time, of course, but for now my column should be in that same space once per month, at a length of about 450 words. As most of you know, it’s a challenge to say something meaningful in so few words, and for me it’s a challenge just to stay within that word limit while saying anything at all. As in poker, I may need a bit of luck.

By the way, I think it's too bad that the New York Times decided to end their chess column last month. I believe, or at least hope, that there is a market for regular information on games like chess for people who don't pay so much attention via other websites and publications. I remember reading Robert Byrne's version of the Times column in the 1970s. I would get especially excited when my father came home on one of the days the column ran, so that I could grab his newspaper and check it out. Yes, it used to run at least three times per week, then two, then just on Sundays (when Dylan McClain took over with a different, and I think better, approach from Byrne's). Now it doesn't run at all. The Washington Post ended its column as well, but some major newspapers still have one (The Boston Globe and New York Post come to mind).

PS: If you liked the squeeze play column, here are some of my other pieces on games that you can read online, in reverse chronological order:

"The Science of Winning Poker" (WSJ, July 2013)

"Should Poker Be (A Tiny Bit) More Like Chess?" (this blog, August 2013)

"Chess Championship Results Show Powerful Role of Computers" (WSJ, November 2013)

"Bobby Fischer Recalled" (WSJ, March 2009)

"It's Your Move" (WSJ, October 2007)

"How Chess Became the King of Games" (WSJ, November 2006)

"The Other American Game" (WSJ, July 2005)

"A Match for All Seasons" (WSJ, December 2002)

"Checkmate for a Champion" (WSJ, November 2000)

Friday, March 28, 2014

"Data Journalism" on College ROI at FiveThirtyEight: Where's the Critical Thinking?

A website called PayScale recently published a "College ROI Report" that purports to calculate the return on investment (ROI) of earning a Bachelor's degree from each of about 900 American colleges and universities. I found out about this report from an article on Nate Silver's new FiveThirtyEight website. The article appears under a banner called "DataLab," implying that it is an example of the new "data journalism" that Silver and his site are all about. Unfortunately, the article contains approximately zero critical thinking about the meaning of the PayScale report, its data sources, and its conclusions.

PayScale did a lot of number-crunching (read all about it here), but the computation resulted in two key numbers for each institution: (1) the cost of getting an undergraduate degree, taking into account factors like financial aid and time to graduation; and (2) the expected total earnings of a graduate over the next twenty years. The first one can be figured out from public data sources. The second one came from a survey by PayScale (more on this later). The ROI for a college was calculated by subtracting #1 from #2, and then further subtracting the expected total earnings of a person who skipped college and worked for 24–26 years instead (which happens to be about $1.1 million). The table produced by PayScale thus purports to show how much you would get back—in monetary income—on the "investment" of obtaining a degree from any particular college or university.

Indeed, PayScale says that "This measure is useful for high school seniors evaluating their likely financial return from attending and graduating college." But this is simply not true. As I read the FiveThirtyEight article on the PayScale report, I was waiting for them to point out the reasons why, but they never did. The only critical comments were about incorporating the effects of student debt.

What are the problems with the PayScale analysis? First of all, it only makes sense to speak of the comparative return on an investment when the investors have a choice of what to invest in. If every person could choose to attend any college (and to graduate from it and get a full-time job), or to skip college entirely, then it would be meaningful to ask which choice maximizes return. This is what we do when calculating a financial ROI: we try to figure out whether investing in stocks versus bonds, or one mutual fund versus another, or one business opportunity versus another, will be more profitable. But colleges have admissions requirements, so not everyone can go to whatever college he or she wants. Colleges select their students as much as students select their colleges. And in fact, the people who attend different colleges can be very different, and they can be even more different from the people who don't attend college at all.

This means that the Return in this "ROI" depends on much more than the Investment. It also depends on who is doing the investing. In fact, it is far from trivial to figure out the true ROI of going to Harvard versus Vanderbilt versus Wayland Baptist versus Nicholls State versus not attending college at all. To figure this out, you would have to control in the analysis for all the characteristics that make students at different colleges different from one another, and different from students who don't go to college. Factors like cognitive ability, ambition, work habits, parental income and education, where the students grew up and went to high school, what grades they got, and many others are likely to be important. In fact, those other factors could be so important that they might wind up explaining more of the variation in income between people than is explained by going to college—let alone which particular college people go to.

Even controlling for data we might be able to obtain, like the average SAT score and parental income of students who attend each college, would not completely solve the problem, because there could be factors that we can't measure that have an important effect. Only by randomly assigning students to different colleges (or to directly entering the workforce after high school) would we get an estimate of the true ROI (measured in money—which of course leaves aside all the other benefits one might get from college that don't show up in your next twenty years of paychecks).

Of course this ideal experiment won't ever happen, but clever researchers have tried to approximate it by doing things like looking at students who were accepted to both a higher-ranked and a lower-ranked school, and then comparing those who enroll in the higher-ranked one to those who enroll in the lower-ranked one. Since all the students in this analysis got into both schools, the problem of different schools having different students is mitigated. (Not erased entirely, though: for example, people who deliberately attend lower-ranked schools might be doing so because of financial circumstances, or their college experience may differ because they are likely to start out above average in ability and preparation for the school they attend, as compared to those who choose higher-ranked schools.)

FiveThirtyEight said nothing about this fundamental logical problem with the entire PayScale exercise. Nor did it address the other flaws in the analysis and presentation of the data.

It could have also asked about the confidence intervals around the ROI estimates provided by PayScale. When you give only point estimates (exact values that represent just the mean or median of a distribution), and proceed to rank them, you create the appearance of a world where every distinction matters—that the school ranked #1 really has a higher ROI than #2, which is higher than #3, and so on. PayScale's methdology page says, "the 90% confidence interval on the 20 year median pay is ±5%" (but 10% for "elite schools" and "small liberal arts schools or schools where a majority of undergraduates complete a graduate degree"). The narrowness of these intervals is a bit hard to believe, as well as their uniformity (how does every school in a category get the same confidence interval?). Why not just put the school-specific confidence intervals into the report, so that it is obvious that, for example, school #48 (Yale) is probably not significantly higher in ROI than, say, school #69 (Lehigh), but is probably lower in ROI than school #6 (Georgia Tech)?

It's hard to have much confidence in these confidence intervals anyhow, since we don't know how many people PayScale surveys at each college to make the income calculations (which will be the critical drivers of the variability in ROI). Many of the colleges are small; how reliable can the estimates of what their graduates will earn be? And are the surveys of college graduates unbiased with respect to what field the graduates work in? Or, for example, do engineers and teachers tend to respond to these surveys more than, say, baristas and consultants? The unemployed and under-employed are not included; this will have the effect of inflating the apparent ROI of schools whose graduates tend, for whatever reasons, not to have full-time jobs. Payscale says that non-cash compensation and investment income are not included, which might bias down the reported ROI of graduates of elite schools who go into financial careers.

Finally, perhaps FiveThirtyEight could have looked at whether the schools that stand out at either end of the distribution happen to be smaller than the ones in the middle. Ohio State, Florida State, et al. have so many students, drawn from such a broad distribution of ability and other personal traits, that they should be expected to have "ROI" values nearer to the middle of the overall distribution of universities than should small colleges, which through pure chance (having, by luck, more high- or low-income graduates) are more likely to land in the top or bottom thirds of the list. Some degree of mean reversion may be expected, so the rankings of PayScale will lose some predictive value for future ROIs, especially in the case of small schools.

The comments I have made all concern the underlying PayScale report, but I think it is FiveThirtyEight that has not upheld the best standards of "data journalism." If that term is to have any meaning, it can't simply refer to "journalism" that consists of the passing along of other people's flawed "data" (especially when those people are producing and promoting the data for commercial purposes). Nate Silver earned his reputation, and that of his FiveThirtyEight brand, largely by calling out—and improving on—just this kind of simplistic and misleading analysis. It's sad to see his "data journalism organization" no longer criticizing superficiality, but instead promoting it.

Postscripts: 3/29/14: After I first posted this piece, I realized three things. First, I hadn't mentioned mean reversion originally, so I added it in. But it's a minor issue compared to the others. Second, I didn't make it clear that notwithstanding what I wrote above, I am 100% in favor of more good data journalism. I agree with Nate Silver and others that journalists (and everyone!) should be more aware of the data that exists to answer questions, how to gather data that has not already been compiled, how to think about data, and so on. A great example of silly data-ignorant journalism is the series of articles the New York Post has been running on the "epidemic" of suicides and suspicious deaths in the financial industry. The proper question to start with is whether there is an epidemic, or even a significant excess over normal variation, as opposed to  a set of coincidences that would be expected to happen every so often. Perhaps there is an epidemic, but I am skeptical. The Post (and other outlets that have reported on these deaths) skip right over this crucial threshold issue. Maybe FiveThirtyEight could address it and teach its readers about the danger of jumping to conclusions after seeing nonexistent patterns in noise. Third, and finally, I should have mentioned that FiveThirtyEight has on board some people who really do know how to think seriously about data (and do it much better than I do), such as the economist Emily Oster. I hope Emily's influence will spread throughout the organization. 3/30/14: I removed text in the original version that asked whether outliers like hedge fund managers had their incomes included in PayScale's calculations. They won't have too much influence, regardless, because PayScale is reporting medians, not means. My apologies for the inadvertent error. 4/5/14: I changed the number of colleges included from 1310 to "about 900." There are 1310 entries in Payscale's table, but many colleges are listed more than once if they have different tuition options (e.g. state resident versus non-resident). 4/7/14: I added links to the Krueger & Dale (and Dale & Krueger) economics papers that tried to estimate the returns from attending more selective/elite colleges. I knew about these papers when I wrote the initial post, but had forgotten who the authors were.

Friday, October 4, 2013

Why Malcolm Gladwell Matters (And Why That's Unfortunate)

Malcolm Gladwell, the New Yorker writer and perennial bestselling author, has a new book out. It's called David and Goliath: Misfits, Underdogs, and the Art of Battling Giants. I reviewed it (PDF) in last weekend's edition of The Wall Street Journal. (Other reviews have appeared in The Atlantic, The New York Times, The Guardian, and The Millions, to name a few.) Even though the WSJ editors kindly gave me about 2500 words to go into depth about the book, there were many things I did not have space to discuss or elaborate on. This post contains some additional thoughts about Malcolm Gladwell, David and Goliath, the general modus operandi of his writing, and how he and others conceive of what he is doing.

I noticed some interesting reactions to my review. Some people said I was a jealous hater. One even implied that as a cognitive scientist (rather than a neuroscientist) I somehow lacked the capacity or credibility to criticize anyone's logic or adherence to evidence. A more serious response, of which I saw several instances, came from people who said in essence "Why do you take Gladwell so seriously—it's obvious he is just an entertainer." For example, here's Jason Kottke:
I enjoy Gladwell's writing and am able to take it with the proper portion of salt ... I read (and write about) most pop science as science fiction: good for thinking about things in novel ways but not so great for basing your cancer treatment on. 
The Freakonomics blog reviewer said much the same thing:
... critics have primarily focused on whether the argument they think Gladwell is making is valid. I am going to argue that this approach misses the fact that the stories Gladwell tells are simply well worth reading.
I say good for you to everyone who doesn't take Gladwell seriously. But the reason I take him seriously is because I take him and his publisher at their word. On their face, many of the assertions and conclusions in Gladwell's books are clearly meant to describe lawful regularities about the way human mental life and the human social world work. And this has always been the case with his writing.

In The Tipping Point (2000), Gladwell wrote of sociological regularities and even coined new ones, like "The Law of the Few." Calling patterns of behavior "laws" is a basic way of signaling that they are robust empirical regularities. Laws of human behavior aren't as mathematically precise as laws of physics, but asserting one is about the strongest claim that can be made in social science. To say something is a law is to say that it applies with (near) universality and can be used to predict, in advance, with a fair degree of certainty, what will happen in a situation. It says this is truth you can believe in, and act on to your benefit.

A blurb from the publisher of David and Goliath avers: "The author of Outliers explores the hidden rules governing relationships between the mighty and the weak, upending prevailing wisdom as he goes." A hidden rule is a counterintuitive, causal mechanism behind the workings of the world. If you say you are exploring hidden rules that govern relationships, you are promising to explicate social science. But we don't have to take the publisher's word for it. Here's the author himself, in the book, stating one of his theses:
The fact of being an underdog changes people in ways that we often fail to appreciate. It opens doors, and creates opportunities and educates and permits things that might otherwise have seemed unthinkable.
The emphasis on changes is in the original (at least in the version of the quote I saw on Gladwell's Facebook page). In an excerpt published in The Guardian, he wrote, "If you take away the gift of reading, you create the gift of listening." I added the emphasis on create to highlight the fact that Gladwell is here claiming a causal rule about the mind and brain, namely that having dyslexia causes one to become a better listener (something he says made superlawyer David Boies so successful).

I've gone on at length with these examples because I think they also run counter to another point I have seen made about Gladwell's writings recently: That he does nothing more than restate the obvious or banal. I couldn't disagree more here. Indeed, to his credit, what he writes about is the opposite of trivial. If Gladwell is right in his claims, we have all been acting unethically by watching professional football, and the sport will go the way of dogfighting, or at best boxing. If he is right about basketball, thousands of teams have been employing bad strategies for no good reason. If he is right about dyslexia, the world would literally be a worse place if everyone were able to learn how to read with ease, because we would lose the geniuses that dyslexia (and other "desirable difficulties") create. If he was right about how beliefs and fads spread through social networks in The Tipping Point, consumer marketing would have changed greatly in the years since. Actually, it did: firms spent great effort trying to find "influentials" and buy their influence, even though there was never good causal evidence that this would work. (See Duncan Watts's brilliant book Everything is Obvious, Once You Know the Answerreviewed here—to understand why.) If Gladwell is right, also in The Tipping Point, about how much news anchors can influence our votes by deploying their smiles for and against their preferred candidates, then democracy as we know it is a charade (and not for the reasons usually given, but for the completely unsupported reason that subliminal persuaders can create any electoral results they want). And so on. These ideas are far from obvious, self-evident, or trivial. They do have the property of engaging a hindsight bias, of triggering a pleasurable rush of counterintuition, of seeming correct once you have learned about them. But an idea that people feel like they already knew is much different from an idea people really did know all along.

Janet Maslin's New York Times review of David and Goliath begins by succinctly stating the value proposition that Gladwell's work offers to his readers:
The world becomes less complicated with a Malcolm Gladwell book in hand. Mr. Gladwell raises questions — should David have won his fight with Goliath? — that are reassuringly clear even before they are answered. His answers are just tricky enough to suggest that the reader has learned something, regardless of whether that’s true.
(I would only add that the world becomes not just less complicated but better, which leaves the reader a little bit happier about life.) In a recent interview with The Guardian, Gladwell as much as agreed: "If my books appear to a reader to be oversimplified, then you shouldn't read them: you're not the audience!"

I don't think the main flaw is oversimplification (though that is a problem: Einstein was right when he—supposedly—advised that things be made as simple as possible, but no simpler). As I wrote in my own review, the main flaw is a lack of logic and proper evidence in the argumentation. But consider what Gladwell's quote means. He is saying that if you understand his topics enough to see what he is doing wrong, then you are not the reader he wants. At a stroke he has said that anyone equipped to properly review his work should not be reading it. How convenient! Those who are left are only those who do not think the material is oversimplified.

Who are those people? They are the readers who will take Gladwell's laws, rules, and causal theories seriously; they will tweet them to the world, preach them to their underlings and colleagues, write them up in their own books and articles (David Brooks relied on Gladwell's claims more than once in his last book), and let them infiltrate their own decision-making processes. These are the people who will learn to trust their guts (Blink), search out and lavish attention and money on fictitious "influencers" (The Tipping Point), celebrate neurological problems rather than treat them (David and Goliath), and fail to pay attention to talent and potential because they think personal triumph results just from luck and hard work (Outliers). It doesn't matter if these are misreadings or imprecise readings of what Gladwell is saying in these books—they are common readings, and I think they are more common among exactly those readers Gladwell says are his audience.

Not backing down, Gladwell said on the Brian Lehrer show that he really doesn't care about logic, evidence, and truth—or that he thinks discussions of the concerns of "academic research" in the sciences, i.e., logic, evidence, and truth—are "inaccessible" to his lowly readers:
I am a story-teller, and I look to academic research … for ways of augmenting story-telling. The reason I don’t do things their way is because their way has a cost: it makes their writing inaccessible. If you are someone who has as their goal ... to reach a lay audience ... you can't do it their way.
In this and another quote, from his interview in The Telegraph, about what readers "are indifferent to," the condescension and arrogance are in full view:
And as I’ve written more books I’ve realised there are certain things that writers and critics prize, and readers don’t. So we’re obsessed with things like coherence, consistency, neatness of argument. Readers are indifferent to those things. 
Note, incidentally, that he mentions coherence, consistency, and neatness. But not correctness, or proper evidence. Perhaps he thinks that these are highfalutin cares for writers and critics, or perhaps he is some kind of postmodernist for whom they don't even exist in any cognizable form. In any case, I do not agree with Gladwell's implication that accuracy and logic are incompatible with entertainment. If anyone could make accurate and logical discussion of science entertaining, it is Malcolm Gladwell.

Perhaps ... perhaps I am the one who is naive, but I was honestly very surprised by these quotes. I had thought Gladwell was inadvertently misunderstanding the science he was writing about, and making sincere mistakes in the service of coming up with ever more "Gladwellian" insights to serve his audience. But according to his own account, he knows exactly what he is doing, and not only that, he thinks it is the right thing to do. Is there no sense of ethics that requires more fidelity to truth, especially when your audience is so vast—and, by your own admission, so benighted—as to need oversimplification and to be unmoved by little things like consistency and coherence? I think a higher ethic of communication should apply here, not a lower standard.

This brings me back to the question of why Gladwell matters so much. Why am I, an academic who is supposed to be keeping his head down and toiling away on inaccessible stuff, spending so much time on reading his interviews, reviewing his book, and writing this blog post? What Malcom Gladwell says matters because, whether academics like it or not, he is incredibly influential.

As Gladwell himself might put it: "We tend to think that people who write popular books don't have much influence. But we are wrong." Sure, Gladwell has huge sales figures and is said to command big speaking fees, and his TED talks are among the most watched. But James Patterson has huge sales too, and he isn't driving public opinion or belief. I know Gladwell has influence for multiple reasons. One is that even highly-educated people in leadership positions in academia—a field where I have experience—are sometimes more familiar with and more likely to cite Gladwell's writings than those of the top scholars in their own fields, even when those top scholars have put their ideas into trade-book form like Gladwell does.

Another data point: David and Goliath has only been out for a few days, but already there's an article online about its "business lessons." A sample assertion:
Gladwell proves that not only do many successful people have dyslexia, but that they have become successful in large part because of having to deal with their difficulty. Those diagnosed with dyslexia are forced to explore other activities and learn new skills that they may have otherwise pursued. 
Of course this is nonsense—there is no "proof" of anything in this book, much less a proof that dyslexia causes success. I wonder if the author of this article even has an idea what proper evidence in support of these assertions would be, or if he knows that these kinds of assertions cannot be "proved."

One final indicator of Malcolm Gladwell's influence—and I'll be upfront and say this is an utterly non-scientific and imprecise methodology—that suggests why he matters. I Googled the phrases "Malcolm Gladwell proved" and "Malcolm Gladwell showed" and compared the results to the similar "Steven Pinker proved" and "Steven Pinker showed" (adding in the results of redoing the Pinker search with the incorrect "Stephen"). I chose Steven Pinker not because he is an academic, but because he has published a lot of bestselling books and widely-read essays and is considered a leading public intellectual, like Gladwell. Pinker is surely much more influential than most other academics. It just so happens that he published a critical review of Gladwell's previous book—but this also is an indicator of the fact that Pinker chooses to engage the public rather than just his professional colleagues. The results, in total number of hits:

Gladwell: proved 5300, showed 19200 = 24500 total
Pinker: proved 9, showed 625 = 634 total

So the total influence ratio as measured by this crude technique is 24500/634, or over 38-to-1 in favor of Gladwell. I wasn't expecting it to be nearly this high myself. (Interestingly, those "influenced" by Pinker are only 9/634, or 1.4% likely to think he "proved" something as opposed to the arguably more correct "showed" it. Gladwell's influencees are 5300/24500 or 21.6% likely to think their influencer "proved" something.) Refining the searches, adding "according to Gladwell" versus "according to Pinker" and so on will change the numbers, but I doubt enough corrections will significantly redress a 38:1 difference.

When someone with this much influence on what people seem to really believe (as indexed by my dashed-off method) says that he is just a storyteller who just uses research to "augment" the stories—who places the stories first and the science in a supporting role, rather than the other way around—he's essentially placing his work in the category of inspirational books like The Secret. As Dan Simons and I noted in a New York Times essay, such books sprinkle in references and allusions to science as a rhetorical strategy. Accessorizing your otherwise inconsistent or incoherent story-based argument with pieces of science is a profitable rhetorical strategy because references to science are crucial touchpoints that help readers maintain their default instinct to believe what they are being told. They help because when readers see "science" they can suppress any skepticism that might be bubbling up in response to the inconsistencies and contradictions.

In his Telegraph interview, Gladwell again played down the seriousness of his own ideas: "The mistake is to think these books are ends in themselves. My books are gateway drugs – they lead you to the hard stuff." And David and Goliath does cite scholarly works, books and journal articles, and journalism, in its footnotes and endnotes. But I wonder how many of its readers will follow those links, as compared to the number who will take its categorical claims at face value. And of those that do follow the links, how many will realize that many of the most important links are missing?

This leads to my last topic, the psychology experiment Gladwell deploys in David and Goliath to explain what he means by "desirable difficulties." The difficulties he talks about are serious challenges, like dyslexia or the death of a parent during one's childhood. But the experiment is a 40-person study on Princeton students who solved three mathematical reasoning problems presented in either a normal typeface or a difficult-to-read typeface. Counterintuitively, the group that read in a difficult typeface scored higher on the reasoning problems than the group that read in a normal typeface.

In my review, I criticized Gladwell for describing this experiment at length without also mentioning that a replication attempt with a much larger and more representative sample of subjects did not find an advantage for difficult typefaces. One of the original study's authors wrote to me to argue that his effect is robust when the test questions are at an appropriate level of difficulty for the participants in the experiment, and that his effect has in fact been replicated “conceptually” by other researchers. However, I cannot find any successful direct replications—repetitions of the experiment that use the same methods and get the same results—and direct replication is the evidence that I believe is most relevant.

This may be an interesting controversy for cognitive psychologists, but it's not the point here. The point is that Gladwell says absolutely nothing about the controversy over whether this effect is reliable. All he does is cite the original 2007 study of 40 subjects and rest his case. Even those who have been hooked by his prose and look to the endnotes of this chapter for a new fix will find no sources for the "hard stuff"—e.g., the true state of the science of "desirable difficulty"—that he claims to be promoting. And if the hard stuff has value, why does Gladwell not wade into it himself and let it inform his writing? When discussing the question of how to pick the right college, why not discuss the intriguing research that debates whether going to an elite school really adds economic value (over going to a lesser-ranked school) for those people who get admitted to both. Or, when discussing dyslexia, instead of claiming it is a gift to society, how about devoting the space to a serious consideration of the hypothesis that this kind of early life difficulty jars the course of development, adding uncertainty (increasing the chances of both success and failure, though probably not in equal proportions) rather than directionality. There was so much more he could have done with these fascinating and important topics.

But at least the difficulty finding a simple experiment to serve as metaphor might have jarred Gladwell into realizing that the connection between the typeface effect, however robust it might turn out to be, and the effect of a neurological condition or loss of a parent, is in fact just metaphorical. There is no relevant nexus between reading faint type and losing a parent at an early age, and pretending there is just loosens the threads of logic to the point of breaking. But perhaps Gladwell already knows this. After all, in his Telegraph interview, he said readers don't care about stuff like consistency and coherence, only critics and writers do.

I can certainly think of one gifted writer with a huge audience who doesn't seem to care that much. I think the effect is the propagation of a lot of wrong beliefs among a vast audience of influential people. And that's unfortunate.

Tuesday, October 1, 2013

The Part Before the Colon: Is There a Trend Toward Cleverer Journal Article Titles?

I joined the Society for Personality and Social Psychology last year, even though I am not a social psychologist, because I had to in order to give an invited talk at a pre-conference session of the annual SPSP meeting, which was held in New Orleans. I had a good time, despite having a bad headache during most of my visit. Social psychologists give lots of interesting talks, they tend to be social, and they also dress better than cognitive psychologists and neuroscientists. It was also fun to see which ones made a visit to the casino across the street from the conference hotel.

As an SPSP member, I now receive their flagship journal every month: Personality and Social Psychology Bulletin (PSPB—academics love to refer to journals with acronyms). One of the best parts of the journal, to a non-specialist like me, is the article titles. In psychology, as in many areas of science, there are different strategies for a good title. One is to concisely state the main finding of the paper or the main theoretical claim (occasionally formulated as a question rather than a statement). Another is to precede that kind of title with a clever quip, allusion, pun, or other phrase that grabs attention and orients the (potential) reader towards some aspect of the research you want to emphasize or that makes the work stand out. That is the part before the colon.

An example of this latter strategy is the 1999 article that Dan Simons and I published in Perception. The title was "Gorillas in our midst: Sustained inattentional blindness for dynamic events." (Thanks to M.J. Wraga, a fellow postdoc in the Harvard psychology department at the time, for suggesting the part before the colon.) If you are a real black belt in journal article writing, you can be like Dan Gilbert and combine both a statement of the main finding and a clever quip all into one phrase, as in his wonderful 1993 article  (with two co-authors) "You Can't Not Believe Everything You Read." If there were a best title award this would surely be in the running. At least it's one of my favorites.

I think all kinds of titles can be good, if they are done well. There seems to be a trend toward more clever titles, at least during my time in psychology and social science. Consider the latest issue of PSPB (volume 39, number 10). Here are the article titles, just the parts before the colon:

1. "Show Me the Money"
2. Losing One's Cool
3. Changing Me to Keep You
4. Never Let Them See You Cry
5. Gender Bias in Leader Evaluations
6. Getting It On Versus Getting It Over With
7. The Things You Do For Me
8. "I Know Your Pain"
9. How Large Are Actor and Partner Effects of Personality on Relationship Satisfaction?
10. Touch as an Interpersonal Emotion Regulation Process in Couples' Daily Lives

I classify seven out of ten articles (all but #5, #9, and #10) as following the clever title strategy. That seems like a lot more than I used to see. To hastily test this intuition, I looked at the tables of contents for the same journal 10, 20, and 30 volumes ago, using issue 10 in 2003 and the final issue in 1993 and 1983 (since there were fewer than ten issues per volume then). There seems to have been a sharp increase:

2013: 70%   (7 out of 10)
2003: 10%   (1 out of 10: "The Good, the Bad, and the Healthy")
1993: 0%     (0 out of 11)
1983: 17%   (2 out of 12: "You Just Can't Count on Things Any More" and "Lonely at the Top")

Coincidentally, I received the latest issue of Clinical Psychological Science (volume 1, number 4; TOC apparently not online yet) today as well. It also has ten articles, and none of them have clever parts before the colon in their titles. Maybe clinical psychologists and their subject matter just aren't as funny.

Of course, this is hardly a serious statistical analysis of the phenomenon, and the quippy titles might have just coalesced at random in this particular issue, or this journal might have editors who encourage this kind of title. I should also say that I perceive the trend to exist in other areas besides social psychology. But I have heard it argued that this trend towards cleverer titles—if it really exists!—is a deleterious one, since it puts pressure on authors to come up with clever titles, and makes reviewers and editors and journalists expect to see them, and therefore it may distort the entire research endeavor towards work that can be summed up in not just the proverbial "25 words or less" but in the much higher standard of "10 very clever words or less." I have no strong belief as to whether all this is happening, or in what fields of study, but perhaps it's something to think about.

If someone does the research and writes a journal article on this, they are welcome to use the title "In 25 Words or Less: The Effect of Trends Toward Clever Pre-Colon Article Titles on the Content and Quality of Research." Just make sure to cite this blog entry, or come up with a catchier title yourself.

PS: I am fully prepared to be told that someone else has already said all this, or even done the research relating title catchiness to citation counts or other metrics. I have anticipated this in my other article, "Leap Before You Look: The Surprising Value of Writing Blog Entries Without Doing Your Research First."