Thursday, September 8, 2016

Why Colleges And Universities Should Not Disinvite Speakers

Since 2001, more than two hundred people who have been invited to speak on American college and university campuses have been “disinvited” before they could give their talks. It is easy to find news accounts of these events—or non-events. The most recent may be the NYU administration's decision this week to cancel a talk by Nobel prizewinner James Watson, on strategies for curing cancer, six days before it was scheduled to happen, because of student complaints about statements Watson had made on other topics in the past. It is much harder to find stories about academic leaders who rejected demands for disinvitation and clearly explained why. If I were a college president, and a campaign to disinvite a speaker arose on my campus, here is the letter I would write.

To The Campus Community:

Recently, an organization on our campus announced that a particular person would be speaking here at a future date. Many students, some faculty, and a few alumni of our institution have publicly objected to the invitation of this speaker. Some have demanded that his invitation be rescinded, so that he will not be able to use our “platform,” and the imprimatur of our college will not be attached to the controversial things he was expected to say—or to anything he has said or written in the past. It has been alleged that the speaker will make some students feel uncomfortable or unsafe, that his beliefs are repugnant, and that his ideas are not rational or evidence-based.

I have heard these demands, and listened to the arguments of their supporters. I am writing to say that I do not agree with them, to announce that the speaker’s talk will go forward as planned, and to explain why.

First, let’s be clear that this is not a matter of “freedom of speech” in the legal sense. The First Amendment to the Constitution, as interpreted by the courts, says that governments may not abridge freedom of speech, but it puts no restrictions on private institutions like our college. Disinviting someone is unprofessional and rude, but we have the right to disinvite—or to not invite in the first place—anyone we please.

Some will say that if I do not disinvite this speaker, I am therefore supporting him. This is not the case. There is a clear logical distinction between endorsing a person’s claims and beliefs, and giving him an opportunity to express those claims and beliefs. There are many speakers who have come to our campus with whom I disagree, but I did not block them.

In fact, neither my nor anyone else’s personal preferences should have anything to do with this question. To see why, we must consider what our college, or any college, is here for. An institution of higher education is organized around the concept of learning. Learning is why we are all part of this community. Students are here to learn about the arts, sciences, and other disciplines they pursue. Professors are here to learn as well: to learn entirely new things about the natural, social, and humanistic worlds, and to learn how to teach more effectively. The staff and administration are here to make these endeavors possible.

Learning doesn’t just mean going to class, doing homework, and taking exams. If a group of students or faculty members are so interested in hearing, debating, and engaging with the ideas of a person from outside our community that they decide to invite him here and organize and attend an event, I cannot rebuke them. In fact, I congratulate them, for they are engaged in an act of learning that goes beyond what is strictly required of them. They are spending their personal time and energy furthering the central purpose of our institution.

The reputation of our college and the value of the degrees we confer will not be affected by the speakers we host, but it will suffer if we acquire a reputation for stifling unpopular views. A college does not need to “manage” its “brand” like a for-profit company does. All colleges stand for excellence in scholarship; that is the only brand that matters, and disinviting speakers and suppressing thoughts will only cheapen it.

To those who say the speaker may make them feel unsafe, I must point out that higher education is not designed to make people safe. Instead, it is our society’s designated “safe space” for disruptive intellectual activity. It’s a space that has been created and set apart specifically for the incubation of knowledge, by both students and faculty. Ideas that may seem dangerous or repugnant can be expressed here—even if nowhere else—so that they can be analyzed, discussed, and understood as dispassionately as possible. Many of humanity’s greatest achievements originated as ideas that were suppressed from the public sphere. Some, like the theory of evolution by natural selection, equal rights for women and minorities, trade unions, democracy, and even the right to free speech and expression, are still seen as dangerous decades and centuries later.

If you are against this speaker coming here, please also consider this: Some members of our community—some of your friends and colleagues—do want him to visit. By asking me to disinvite him, you are implicitly claiming that your concerns and preferences are more important than those of the people who invited him. Are you really sure that you are so right and they are so wrong? Psychologists have found that people tend to be overconfident in their beliefs, and poor at taking the perspective of others. That might be the case here.

A decision by me to bar this speaker would have far-reaching negative repercussions. It will make everyone in our community think twice before they stage a provocative event or invite a controversial speaker. Cancelling this invitation will not only prevent this person from talking; it will reduce the expression of views like his in the future, and probably chill speech by anyone who could be regarded as controversial. And it will set a precedent that future leaders in higher education may point to if they feel pressured to do the same. All of this would be antithetical to our common purpose—and our institution's social function—of learning and discovery.

Note that it’s especially important for us to be open to viewpoints not already well-represented among our faculty. The professors here are a diverse group, but many studies have shown that professors tend to be more politically left-wing than the population at large. Even the most conscientious instructor may inadvertently slant his teaching and assignments towards his own political viewpoint. Of course, this applies more in the social sciences and humanities than in math or physics, but it does happen. Giving campus organizations wide latitude to invite the speakers they wish helps to increase the range of thoughts that are aired and discussed here.

If you feel that this speaker’s talk might upset you, I offer this advice: Go. Yes, go to the talk, listen to it, record it—if the speaker and hosts give permission—and think about it. Expose yourself to ideas that trouble you, because avoiding sources of anxiety is not the best way to cope with them.

But don’t try to interrupt or shout the speaker down. Take this golden opportunity to train yourself to respond to speech that upsets you by analyzing it, looking up its sources, developing reasoned counterarguments, and considering why people agree with it and whether it might not be as contemptible as you have been told. These are the skills that all members of our community are committed to building.

In fact, if you’re committed to everything this speaker is against, then you should definitely listen to him. John Stuart Mill wrote, “He who knows only his own side of the case, knows little of that.” When you never encounter people who vigorously argue for positions you don’t agree with, you may come to believe that those arguments don’t have merit—or don’t even exist. The argument you imagine your opponents making is probably weaker and easier to dismiss than the argument they would actually make if they had the chance. So listen to the other side's case in order to strengthen your own. In other words, know thy enemy.

Of course, you don’t have to listen to speakers you disagree with. That’s the beauty of our system: We are all committed to the broad goal of learning, but we are never forced to attend to people we can’t stand. If you want to protest this speaker, do so outside the venue, and do not block anyone from attending. Hand out fliers or arrange for other speakers to present counterarguments or different ideas. As Justice Louis Brandeis said, “If there be time to expose through discussion the falsehood and fallacies, to avert the evil by the process of education, the remedy to be applied is more speech, not enforced silence.”

As the leader of this community of scholars, I would be doing the opposite of my duty were I to force silence on this or any other speaker. Therefore, I decline the requests to disinvite him. And I encourage all campus groups and organizations to invite the speakers they want to hear, knowing that I will respect and support your efforts to learn and engage with their ideas.

Sincerely,

Your College President

Sunday, January 24, 2016

Confusion About Correlation and Causation ... in a Research Methods Textbook?!

Every so often, textbook publishers send me free copies of their books. Usually these are books for courses I teach, but sometimes they aren't. This week I received a copy of Discovering the Scientist Within: Research Methods in Psychology from Worth Publishers, a new title in its first edition. I don't teach a research methods course, but I flipped through the table of contents anyhow, and I noticed an entry called "Research Spotlight: The Upside to Video-Game Play" on page 31. Since the question of how playing video games might affect cognition and behavior is a controversial one, I was curious to see what the authors had to say about it in a research methods context. Unfortunately, their discussion has some problems.

First, they claim that "there are some real advantages to playing video games," citing a finding that "more time spent playing video games coincided with greater visual-spatial skills." Stating that playing video games has advantages is a statement of causality. If people who played video games just happened to have greater visual-spatial skills (maybe because their visual-spatial skills were greater than those of non-gamers to start, or because their visual-spatial skills were improving faster than those of non-gamers), there would be no "advantage" to the game-playing. The abstract of the underlying paper by Jackson et al. (2011) makes no mention of random assignment of participants to different amounts of video game play, so there's no justification for inferring causality. (Additionally, it notes that the video-game players had lower GPAs.)

Second, they say that "it just so happens that surgeons benefit from video-game playing as well," citing a study by Rosser et al. (2007) that found that "surgeons who played video games for more than 3 hours a week made 37% fewer errors and were 27% faster in laparoscopic surgery and suturing drills compared to surgeons who never played video games." This is followed by speculation as to the mechanism by which game playing could cause these differences. However, the evidence of causation here is even weaker than for the study of children cited above. It's not even a longitudinal study—it's just a cross-sectional finding of an association between video game play and performance on (computerized) tests of surgical skill. Again, one need only read as far as the Rosser et al. abstract to find the statement "Video game skill correlates with laparoscopic surgical skills." There is no evidence of causality, but the textbook authors have said that surgeons "benefit" from playing video games.

Finally, the caption below the stock photo reinforces the thrust of the boxed text by asking "If video games can make you a better surgeon, what other ares of your life could playing video games improve?" As they say in courtroom dramas, "Objection! Assumes facts not in evidence."

Sure, this is a run-of-the-mill mistake that laypeople make all the time: Confusing evidence of correlation (video game playing co-occurring with increased spatial skill or surgical proficiency) for evidence of causation (playing video games making your spatial skills and surgical proficiency better than they were before). But this is a textbook on research methods in psychology. If the authors of such books have the proverbial "one job to do," it is teaching their readers what conclusions can be drawn from what kinds of evidence. That's what education in research methods is all about: learning to design research studies that have the power to permit certain inferences, and learning which inferences can and cannot logically follow from which designs. You can think of analogies to other fields—a nutrition book reversing the properties of carbohydrates and fat? An algebra textbook getting the quadratic formula wrong? A history book that confuses the Declaration of Independence for the Constitution? Correlation versus causation is not a nuance or side issue; it's at the heart of the behavioral science enterprise.

The authors of Discovering the Scientist Within must understand the distinction between correlation and causation, and I am sure they can generate the plausible alternative (non-causal) explanations for these video game results that I mention above. I know this because on page 30, in the paragraph immediately before the "Research Spotlight" box, they write, "often there is not a set direction of how one thing influences another ... News coverage, such as in cases of school shootings, often portrays playing video games as the cause of aggressive behavior. Yet it is equally likely that aggressive individuals gravitate toward violent video games" [emphasis added].

The fact that mistakes like this can turn up in a book meant to educate its readers to avoid them is remarkable, and I think it goes to show just how confounding sound causal inference can be for the human mind. As Daniel Simons and I argued in The Invisible Gorilla, human beings are susceptible to an "illusion of cause" that leads us to jump to particular causal conclusions in all kinds of situations where the evidence we have doesn't logically justify them—indeed, where other explanations are equally or even more likely, and where the assumption of causality can get us into big trouble. The ease with which we can generate mechanisms to explain a particular causal inference can contribute to the illusion. For example, being aware of the "neural plasticity" concept could make it seem more likely that intensive cognitive work (e.g., video-gaming) might "train" some more fundamental underlying cognitive capacity (e.g., spatial skill) or transfer to some other practical task (e.g., surgical proficiency). None of us, not even psychology professors who write textbooks on research methods, are immune to these fundamental thinking pitfalls.

Hopefully the second edition of Discovering the Scientist Within will correct these correlation/causation errors, as well as any other issues that may lurk in the text. My quick flip-through picked up one more passage the authors might want to think about rewriting:
... the author Malcolm Gladwell, a self-described "cover band for psychology," is known for his ability to summarize and synthesize psychological findings so that the general public can benefit from the exciting advances in knowledge that psychological researchers have made.
Accompanying this sentence on page 41 is a photo of Gladwell's book Outliers. As many readers of this blog will know, I don't agree that the general public is benefitting from Malcolm Gladwell's writing, precisely because he doesn't summarize and synthesize as well as people think he does. Since correlation, causation, and statistical thinking are among the things Gladwell has difficulty with, it doesn't seem like a research methods textbook should be endorsing his work.

Monday, November 2, 2015

Why Phones Need a Driving Mode: Questions and Answers

Last Friday, The Wall Street Journal published "A Simple Solution for Distracted Driving," an essay that I wrote with Daniel Simons. We argued that all smartphones should come equipped with a Driving Mode—an easily activated or default setting that would prevent users from engaging in the most distracting activities while they were driving their cars. In such a short piece, we were only able to sketch the outlines of this idea. Below are some further elaborations, based on comments we saw or received, organized as a set of questions and answers. (If you want to hear me talk about Driving Mode for a few minutes, you can listen to my radio interview this morning on The Financial Exchange.)

My phone already has Driving Mode.
As we noted, some phones already have features like the one we are proposing, and some of them are similar to our version. However, the vast majority of phones do not have the "robust" driving mode that we advocate. By robust we mean a driving mode that: (1) eliminates all sources of significant distraction, including non-emergency communications; (2) permits full use of GPS and navigation; and (3) sends automatic responses to anyone who tries to contact you while you're driving, and holds the incoming messages without showing them to you until you exit driving mode. Here's an interesting blog post along similar lines to our idea, with more implementation details (we weren't aware of this post when we wrote our essay).

Isn't what you propose the same as Apple Car Play?
No. From what we can tell about Car Play, it's an iOS 9 feature that integrates your phone with a screen built into your car. And it permits the user to do a lot more than our driving mode would. Car Play lets you make and take calls, send and receive messages, etc. Sure, it lets you do this hands-free, but those activities are still very cognitively absorbing even if you don't need your hands to do them.

What about AT&T's "DriveMode" app? (Or similar apps.)
Apps that implement something resembling our robust driving mode are great, and people should use them. AT&T's DriveMode app has several nice features, but lacks some of the ones we think are important. What we'd like to see is a universal driving mode that is fully integrated with the operating system and the hardware of the phone, so that it can have full control over all communication, app use, etc. Third-party apps may not have sufficient control to really implement a robust driving mode, or to stay functional when the operating system or hardware change.

I just put my phone in my purse/briefcase or turn it off, so I don't need Driving Mode.
If you can maintain the discipline to do this, good for you! But this prevents you from using your phone for GPS and navigation, features that probably provide more value than they cost in potential distraction. For users who need those features, or who forget or can't force themselves to put their phones far away while driving, a Driving Mode would help. And some of those less disciplined users are going to be behind the wheel of their own cars while you are on the road.

There are already public service campaigns, laws, and other efforts against distracted driving.
Very true. In an earlier draft of our essay, we mentioned some of these. We aren't against them (except when they are based on erroneous assumptions, such as the idea that hands-free technology will solve the problems of limited attention); we just think that it's so tempting to use a phone while driving, and also so dangerous, that industry can do more to help customers exert self-control.

I like my car's head-up display feature. It feels as though I can read the information on the windshield while keeping my eyes on the road and not driving any worse.
Unfortunately, there is a lot of research on attention in general and head-up displays in particular, and it all generally concludes that this feeling is an illusion. Like the feeling we have that we can talk on the phone (or do even more) while driving, or the feeling people have that they can drive just fine when they are drunk, it reflects a mismatch between the signals our brain uses to monitor its own performance levels, and the reality of those performance levels. Often they line up, but when it comes to knowing how well we are paying attention, we can be way off.

Won't a driving mode that activates based on GPS-detected speed stop passengers from using their phones? And what about people using mass transportation?
We were well-aware of these issues, but in a short essay we didn't have space to address them. And we aren't sure ourselves of the optimal solutions. But we are cognitive psychologists, not mobile operating system designers. We are sure that the geniuses at Apple and Google can come up with clever answers, perhaps working with car makers and phone service providers. Meanwhile, here are a couple of our thoughts:

  • The key principle is that a phone should have an intelligent default behavior, without preventing users from doing things outside the default. If a phone enters driving mode automatically over 10mph, perhaps it would require just two taps to exit the mode. Passengers or mass-transit riders could do this quickly and easily when a prompt popped up on their screen, but drivers would be less likely to. Some would, of course, but many wouldn't. Many users stick with default settings and never learn how to change them, or learn how but don't bother. The default is not just an arbitrary factory setting: it is also interpreted as a recommendation or a social norm.
  • Perhaps this feature could be deactivated by a driver before he starts driving, so that driving mode wouldn't start. This simply shifts the burden of action from those who want to be in driving mode to those who don't. Again, there is no God-given standard setting for what features should be available on a phone at what times: making people affirmatively decide to enable distractions seems just as sensible, if not more so, than making them affirmatively decide to disable them.
  • Even if there is no automatic activation for Driving Mode, its mere existence, combined with the ease of initiating it, should help. Even if only 10% of drivers would turn it on, that would be a win.

Driving Mode is a further step in the infantilization of people by governments and elites. Adults should be aware of what they are and aren't capable of doing, and should be free to choose how to use their phones.
Speaking for myself only, I sympathize greatly with this point of view.  I wish people were more aware of their mental capabilities, and I try to educate people about that. I worry quite a bit about the impulse to regulate or forbid behavior that people don't like. Regulations are often put in place on speculation but rarely repealed when their costs turn out to outweigh their benefits. But it seems much more infantilizing to regulate, say, what words people can use, or what Halloween costumes they can wear, or what subjects they can research or study, than to regulate how distracted they can make themselves while driving a car at high speeds. We already regulate many aspects of driving for safety reasons: we restrict speeds, require turn signals, encourage seatbelt use, paint lanes on roads, put up stoplights, and so on. Even if you have perfect self-control and don't need a driving mode, you might agree that other people on the road could benefit from being less distracted, and thus you would benefit too.
     One way to look at the situation is this: The invention of the internet and the smartphone have brought countless benefits to everyone. Society is much, much better off with them than without. Compared to the gains we have made in staying connected, having knowledge at our fingertips, and even just being better entertained, the loss from a slightly restrictive driving mode is a very small price to pay. No higher a price, I would think, than what you lose by not being able to drive 70 miles per hour on empty local roads at night, which is a restriction everyone accepts as sensible. While I admire the behavioral technology of the "nudge," which has the power to make people better off without reducing their real options, I also worry that it can be used in inappropriate ways—to push people toward choices that are not in their own true best interests. This, however, is not one of those cases. Eliminating distractions while driving should be in everyone's interests. And finally, note that we are not proposing any new laws or regulations; indeed, regulating the features of smartphones sounds (to me) like a futile exercise. We are only urging the phone industry to think about how to make their products safer and better than they already are.


Sunday, September 13, 2015

No, College Lectures Are Not "Unfair"

In her recent New York Times essay "Are College Lectures Unfair?" Annie Murphy Paul, a science writer, asks "Does the college lecture discriminate? Is it biased against undergraduates who are not white, male, and affluent?" She spends the rest of the essay arguing the affirmative, claiming that "a growing body of evidence suggests that the lecture is not generic or neutral, but a specific cultural form that favors some people while discriminating against others, including women, minorities, and low-income and first-generation college students." She cites various studies that find that "active learning" and "flipped classroom" pedagogical techniques lead to the biggest improvements in performance among the groups who do the worst in traditional lecture-format classes.

Unfortunately, while it may well be true that flipping the classroom (having students watch video lectures at home and using classroom time for problem-solving and discussion) and making learning more active (increasing low-stakes quizzes, which enhance memory for concepts) are excellent ideas—and I personally think they are—the argument that the use of lectures has anything to do with fairness and discrimination is simply erroneous.

Here's why. First, if one group of students tends to perform better than another under a particular instructional regime, then it is likely that any change that improves everyone's performance (as active learning does, according to studies cited by Ms. Paul) should benefit most the group that starts out doing worst. This is a simple function of the fact that grades have maximum values, so there is less room for improvement for students who are already doing well. If this point isn't clear, imagine a hypothetical educational intervention that leads to every student knowing the answer to five particular questions on a 100-question final exam. The best-prepared students would probably have known several of those answers absent the intervention, while the least-prepared students would have good chances of not not having known them. Therefore, the least-prepared students might gain as much as 4 or 5% on their final exam grades, but the best-prepared students might gain as little as 1 or 0%. This would narrow the achievement gap between those groups. Any intervention that results in more learning is likely to have a similar pattern of effects.

So active learning should be good for any students who start out doing worse in general, not just for minority, low-income, or first-generation students. As Ms. Paul notes, "poor and minority students are disproportionately likely to have attended low-performing schools" and get to college "with less background knowledge." Suppose you tried active learning in a school where the students were overwhelmingly the children of white, middle-class, college-educated parents. You would still expect the lowest-performers among those students to benefit the most from the improved methods. Likewise, the best-performing minority students should get less out of improved pedagogy than the worst-performing minority students. In other words, the value of active learning for traditionally underperforming groups of students has everything to do with the fact that those students have underperformed, and nothing to do with the fact that they come from minority groups.

Now to the question of whether older pedagogical approaches "discriminate" against those minority groups, as Ms. Paul says they do. According to the American Heritage Dictionary of the English Language, to discriminate is "to make distinctions on the basis of class or category without regard to individual merit, especially to show prejudice on the basis of ethnicity, gender, or a similar social factor." Teaching a course in an old-fashioned lecture format, though it may often be less effective than teaching in flipped or active formats, and for that reason may result in lower grades for some types of students than for others, makes no distinctions between classes or categories of people, and therefore cannot be a form of discrimination. Indeed, any inferior form of instruction should lead to wider differences between students who start at different levels.

Consider the limiting case: imagine a "form of instruction" that is no instruction at all: the professor never issues a syllabus or even shows up, but still gives the final exam at the end of the semester. Students who started out with the most "background knowledge" about the topic will still know the most, and students who started out knowing the least will still know the least. All pre-existing differences between ethnic groups and any other student groups will remain unchanged. On the other end of the instructional spectrum, suppose the professor's teaching is so perfect that every student learns every bit of the material: then there will be no differences between any groups, because all students will receive grades of 100%. Note that none of this concerns what groups the students are part of—it can all be explained entirely by the artifact of high-quality instruction benefiting poorer-performing students more than better-performing students. Therefore, Ms. Paul's essay uses the words "biased" and "discriminating" incorrectly, with the pernicious effect of accusing anyone who doesn't flip their classroom or give lots of quizzes of being prejudiced against minority students.

For what it's worth, I have been truly impressed by the growing body of research on the science of learning. I think it's one of the most exciting practical achievements of cognitive psychology, and I am trying to incorporate more of it into my own teaching. But I also believe that good, engaging lectures have their place, and may be more effective in some disciplines than others. To be clear, I have no objections to any of the research Ms. Paul cites in her essay. (Indeed, I have not even read most of that research, because what's at issue here is not the scientific results, but the meaning Ms. Paul ascribes to them. I've assumed that she described all the results accurately in her essay. If the researchers somehow managed to separate the effects of minority status and baseline knowledge or performance in their analysis of the active learning effects, Ms. Paul doesn't say anything about how they did it—and since that analysis would be the logical lynchpin for her claims of bias, discrimination, and unfairness, it is negligent to ignore it.) We have learned a lot about effective teaching methods, but none of it justifies the sloppy, inflammatory claim that lectures are "biased" and "discriminate" against students from minority, low-income, or nontraditional backgrounds.

Saturday, January 10, 2015

Martin Thoresen's World Chess Championship

My third “Game On” column, "The Real Kings of Chess Are Computers," appears this weekend in The Wall Street Journal. I write about the "real world chess championship," which is known formally as the Thoresen Chess Engines Competition, or TCEC. This is a semi-annual tournament that pits almost all the top computer chess programs against one another. Since the best chess engines are now much stronger than even the best human players, a battle between the top two engines is a de facto world championship of chess-playing entities.

That battle was the Superfinal match of TCEC season 7, and it was won last month by Komodo over Stockfish (both playing the same 16 core computer). In a digital-only extra, "Anatomy of a Computer Chess Game," I try to explain a key moment in game 14 of the match, which gave Komodo a lead it never relinquished over the remaining 50 games.

As part of the research for these pieces, I interviewed TCEC impresario and eponym Martin Thoresen by email. Below is an edited transcript of our conversation, which took place between 29 December 2014 and 2 January 2015. The questions have been re-ordered to make the flow more logical.

CHRISTOPHER CHABRIS: Let’s start with the recent Season 7 Superfinal match. What is your opinion about the result? Do you think it shows that Komodo is a “better chess player” than Stockfish, in their current versions?
MARTIN THORESEN: I think the Superfinal was very close and exciting. The draw rate was slightly higher than what I expected, but then again the engines are very close in strength so this is quite natural. I think the result shows that Komodo is the better engine on the kind of hardware that TCEC uses. And for grandmasters with powerful computers this should be something to take note of when they analyze games using chess engines.

Do you believe that TCEC features the “best chess players” in the world?
Yes, I would say any of the top programs of say, Stage 3 and onwards would pretty much crush any human player on the planet using TCEC hardware.

Do you think it is a problem to have so many draws (53 out of 64 games)? It definitely distinguishes engine-engine matches from human-human matches to have so many draws, but I agree with you that it must result partly from the players being stronger than the best humans.
Personally I don’t mind the draw rate being this high in the Superfinal, it makes it very tense. But one of the main goals of TCEC is to entertain people. Too many draws defer from that and too many one-sided openings would lower the quality overall, even if it lowers the draw rate. I would be satisfied with a draw rate of roughly 75% in the Superfinal.

You must have watched more engine-engine games than almost anyone else. Were there any games or particular moves or positions that you thought were especially beautiful or revealing in this most recent Superfinal match?
I have not looked deeply at all the games yet, but games like #9 strike me as fascinating.

Let’s talk about some of the details of how TCEC works. Are the games played entirely on your personal computer at your home?
Yes, it’s a 16-core server I’ve built myself. It has two 8-core Intel Xeon processors and 64 GB RAM. It’s located at home here in Huddinge, a suburb of Stockholm, Sweden. I live in an apartment of about 45 square meters.

Why do the games run only one at a time? Because it all happens on one computer? Have you considered using multiple computers so that more games can happen at one time?
Yes exactly, they run only one at a time because the engines utilize all 16 cores to get maximum power, which makes it impossible to run more games. Using more computers is of course something I wish I could do, but then people need to donate more. ☺ The server cost me roughly €4000–€5000 to build. Of course it would be possible to limit each engine to say, four cores, then I could have four games running simultaneously, but then again the engines would be weaker due to the fewer cores. I want TCEC to show only the highest quality of games. Not to mention that I’d have to redesign the website to support many games at once.

How hard was it to write the code that “plays” the two engines against each other, passing moves back and forth, and so on? Do the engines provide you with an API, or do the engine authors give you a special version that corresponds to an API for your own server code? (I assume you wrote the server code yourself too, correct?)
The interface that plays the games is a small command line tool called cutechess-cli, but somewhat modified for TCEC by Jeremy Bernstein after my instructions. I have not coded this tool. Cutechess is simply a UCI/Xboard interface tool that “runs” the engines in accordance with the UCI or Xboard specifications. Basically all chess engines comply with the UCI or Xboard protocols for I/O requests (time control, time left, the move it makes, etc.). Using this tool does not give you a chessboard to view the action like a GUI (Fritz, Arena, SCID, etc.) so ironically I can’t actually watch the game on the server—all I see is a bunch of text.

Who developed the software to broadcast the games to the internet? As someone who followed the latest Superfinal and browsed the archives quite a bit, I can say that it has a very nice interface.
There are two parts of TCEC. One is the website which shows the games, the other is the server on which the games are played. These two are not run on the same machine (for obvious performance reasons), so the server uploads the PGN to the website each minute. The website is designed by me and it has had different designs in previous seasons. The core technology on which it is built is the free JavaScript chess viewer called pgn4web.

How much money would you estimate you have personally spent, and how much total has been spent, to run the TCEC since it started, and season 7 specifically?
I have spent a lot of money. I am not quite sure how much, but I would estimate €6000–€7000 since TCEC started (hardware upgrades, power bills, etc.).

How many hours do you spend on it out of your own life?
For Season 7 I didn’t really code anything new for the website compared to Season 6, so I didn’t spend much time preparing this time around. But when I made the new (current) website for Season 6, I started right after Season 5 finished and coded for almost 3 months straight, sometimes as much as 4–6 hours a day. That left little sleep considering I had (and still have) a full time job as well. But when a season is running, my attention goes mostly to moderate the chat and making sure the hardware runs as it should. So everything from 0–4 hours per day during a season.

Are there any major engines that did not participate over the past few seasons? If so, do you know why they declined?
I pick the engines myself, but there was the case of HIARCS for Season 6, where the programmer Mark Uniacke told me to withdraw it. I only did it because I did not buy his program—he sent it to me for free for Season 5. But if I had bought it myself, I would have included it. Other than HIARCS there have not really been any similar cases in TCEC history. Now and then the question of why Fritz does not participate pops up, but that has a simple answer: It does not come in a form that supports UCI or Xboard—it has a native protocol built into the Fritz GUI which makes it unusable. 

If I understand correctly, your goal is to include every major engine, and the only reasons they could be left out is (a) their authors explicitly withdraw them, or (b) they aren’t compatible with the required protocols. Do I have that right? And that HIARCS and Fritz are the only major engines not participating?
Yes, every major engine that is not a direct clone. The whole clone debate is a hot topic in most computer chess forums. So your (a) and (b) are both correct. HIARCS was not a part of Season 7 for the same reason as it was not a part of Season 6.
 
Has there been any recent criticism of the TCEC from chess engine developers that were not included (Fritz), or sat out (HIARCS), or others?
No, there has not.
 
How strong a chess player are you? Do you play in tournaments, a club, or online?
I am not very strong. I don’t even have a rating. I would estimate my strength at around 1500 FIDE on a good day.
 
Can you tell me a bit about yourself?
I am 33 years old and living with my dog. (For now!) I am currently working as an IT consultant and for the past 1.5 years I’ve worked for Microsoft as part of their international Bing search engine MEPM team. I have no formal education apart from what would equal high school in the U.S. Everything I’ve done so far is self-taught.
 
How many other people help regularly in organizing and running the TCEC? Are they all volunteers?
Nelson Hernandez is in charge of the openings, assisted by Adam Hair and international master Erik Kislik. Jeremy Bernstein has helped me with the cutechess-cli customization. Paolo Casaschi (author of pgn4web) has also helped me with some specific inquires I’ve had about JavaScript code. They are all volunteers. ☺

How did the idea for the TCEC come to you?
Basically it started after I left the computer chess ranking list (CCRL) after a couple of years of being a member. I was tired of just running computer chess engines games for statistics—I wanted to slow down the time control and watch the games. Obviously, the idea of a live broadcast wasn’t new, and in the beginning it was very simple, just a plain website with moves and not much else. It has now evolved with a more advanced website that I think is kind of intuitive and nice to use and gives TCEC a kind of unique platform.
 
Why is there so little time between TCEC seasons? Why not one season per year, more like the human world championship? Do the engines change enough between seasons for such frequent seasons to be meaningful?
The rhythm the past few years has been roughly two seasons per year. One season takes 3–4 months, so basically you can watch TCEC for half a year per year. It is definitely debatable whether this is useful or meaningful, but that’s just how it has been. Of course, this might change in the future. I have no other good answer. ☺

What are your plans for the future of TCEC, short-term and long-term?
Short-term would be to take a (well deserved) break. ☺ Long-term would be to be recognized by some big company to “get the ball rolling.”

Are you planning any changes in the format or rules for Season 8?
There might be changes for Season 8. Nothing is decided yet.
 
Regarding rules, while following the Superfinal games I noticed that some games were declared drawn by the rules when there seemed to be a lot of life left in the position—for example, the final position of game 18, which human grandmasters might play on for either side. Do you think this rule might be revised?
I don’t think the TCEC Draw Rule or TCEC Win Rule will be changed. They have been there from the start (slightly modified since the beginning) and no one is really complaining. As for the particular example with game 18, both engines are 100% certain that this is a draw (both show 0.00) so even if we humans think it looks chaotic, the engines simply have it all calculated way in advance.
 
I noticed that endgame tablebases were not used in the Superfinal, and this must have resulted in some incorrect evaluations. For example, as I was watching one game, I saw that one engine’s principal variation ended in a KRB-vs-KNN position, which is a general win for the stronger side, but the evaluation was not close to indicating a forced win. Do you think that could have helped cause more draws to happen?
That is correct, tablebases were disabled for all engines for the whole of Season 7. Previously they had been available, but some fans wanted them disabled so I figured they would have their wish fulfilled for Season 7. What tablebases do is to basically help the engines find the correct way into a winning endgame—or in worst case scenario, prevent a loss. It shouldn’t affect the draw rate overall since it would even out in the end. But the point is that without tablebases, the engines can only rely on their own strength in the endgame and the path for getting there.

Have you thought of inviting strong players to comment on the games live, as happens in the top human-versus-human tournaments and matches? Is it too expensive?
We’ve had some discussions, but nothing concrete yet. It could probably be something to do for the Superfinal if the required money could be arranged.

Have you approached any major companies like Intel, AMD, or Microsoft about sponsoring the event or making it much bigger in scope/publicity?
Not in a while. Back when I did, I got no reply or acknowledgment whatsoever.

Do you have data on how many people in total looked at the latest Superfinal on tcec.chessdom.com, and any other rough numbers on chat commenters, etc.? 
There were approximately 26,000 unique visitors there during the Superfinal. From memory, the number of users in the chat peaked at roughly 600 at one point during the match.

Do you think that the chess world should pay more attention to TCEC in particular, and to engine-versus-engine games in general? They are rarely quoted in discussions of opening theory, or of the best games, best moves, or most interesting positions. Do you have an opinion about why this is?
I think they should. There are so many beautiful games coming out of TCEC that can blow one’s mind. Why we see little reference to engine-versus-engine games is hard to say, but my guess is that it related to the fact that a chess engine is basically an A.I., so people might have a hard time admitting that “a robot” can play even more beautiful chess than humans.
 
What intrigues me most about TCEC may be the fact that it is a very personal project for you, yet it has attained a measure of worldwide respect and fame without having a big sponsor or lots of money involved.
This project is of course very personal. Anton Mihailov of chessdom.com contacted me prior to Season 5 and we have continued our cooperation since. To have a hobby being acknowledged like that is of course very nice. With that said, if Intel or AMD or any other big company would be interested in sponsoring TCEC I would definitely be interested in having a talk with them too. Bottom line is: Most people regard TCEC as the official “world computer chess championship.” And I don’t think they are wrong about that! ☺

My thanks to Martin Thoresen, grandmaster Larry Kaufman (of the Komodo team), international master Erik Kislik (who made the final selection of openings for the match), and everyone else who answered my questions for these pieces. I am looking forward to Season 8 of TCEC!

Tuesday, December 2, 2014

More on "Why Our Memory Fails Us"

Today the New York Times published an op-ed by Daniel Simons and myself, under the title "Why Our Memory Fails Us." In the article, we use the recent discovery that Neil deGrasse Tyson was making incorrect statements about George W. Bush based on false memories as a way to introduce some ideas from the science of human memory, and to argue that we all need to rethink how we respond to allegations or demonstrations of false memories. "We are all fabulists, and we must all get used to it" is how we concluded.

In brief, Tyson told several audiences that President Bush said the words "Our God is the God who named the stars" in his post-9/11 speech in order to divide Americans from Muslims. Sean Davis, a writer for the website The Federalist, pointed out that Bush never said these exact words, and that the closest words he actually said were spoken after the space shuttle explosion in 2003 as part of a tribute to the astronauts who died. Davis drew a different conclusion than we did—namely that the misquotes show Tyson to be a serial fabricator—but he brought Tyson's errors to light in a series of posts at The Federalist, and he deserves credit for noticing the errors and inducing Tyson to address them.

Tyson first responded, in a Facebook note, by claiming that he really did hear Bush say those words in the 9/11 context, but he eventually admitted that this memory had to be incorrect.

All this happened in September. After reading Tyson's response, I wondered why it didn't include a simple apology to President Bush for implying that he was inciting religious division. On a whim I tweeted that Tyson should just apologize and put the matter behind him:




I had never met or communicated with Neil deGrasse Tyson, and I doubt he had any idea who I was, so it was somewhat to my surprise that he replied almost immediately:


A few days later, Tyson issued his apology as part of another Facebook note entitled "A Partial Anatomy of My Public Talks." Hopefully it is clear that we wrote our piece not to pick apart Tyson's errors or pile on him, but to present the affair as an example of how we can all make embarrassing mistakes based on distorted memories, and therefore why our first reaction to a case of false memory should be charitable rather than cynical. Not all mistaken claims about our past are innocent false memories, of course, but innocent mistakes of memory should be understood as the norm rather than the exception.

The final version of the op-ed that we submitted to the New York Times was over 1900 words long; after editing, the published version is about 1700 words. Several pieces of information, including the names of Davis and The Federalist—who did a service by bringing the matter to light—were casualties of the condensation process. (A credit to ourselves for the research finding that most people believe memory works like a video camera was also omitted.) We tried to leave it clear that we deserve no credit for discovering Tyson's misquote. In our version there were also many links that were omitted from the final online version. In particular, we had included links to Davis's original Federalist article, Tyson's first reply, and Tyson's apology note, as well as several of the research articles we mentioned.

For the record, below is a list of all the links we wanted to include. Obviously there are others we could have added, but these cover what we thought were the most important points relevant to our argument about how memory works. For reasons of their own, newspapers like the Times typically allow few links to be included in online stories, and prefer links to their own content. Even our twelve turned out to be too many.

Neil deGrasse Tyson's 2008 misquotation of George W. Bush (video)

Bush's actual speech to Congress after 9/11 (transcript)

Bush's 2003 speech after the space shuttle explosion (transcript)

Sean Davis's article at The Federalist

Tyson's initial response on Facebook

Tyson's subsequent apology on Facebook

National Academy of Sciences 2014 report on eyewitness testimony

Information on false convictions based on eyewitness misidentifications from The Innocence Project (an organization to which everyone should consider donating)

Roediger and DeSoto article on confidence and accuracy in memory

Simons and Chabris article on what people believe about how memory works

Registered replication report on the verbal overshadowing effect

Daniel Greenberg's article on George W. Bush's false memory of 9/11

Monday, November 10, 2014

GAME ON — My New Column in the Wall Street Journal

For a while I’ve had the secret ambition to write a regular newspaper column. At one time I thought I could write a chess column; at other times I thought that Dan Simons and I could write a series of essays on social science and critical thinking. Last year I suggested to the Wall Street Journal a column on games. They turned me down then, but a few weeks ago I gently raised the idea again and the editors kindly said they would give it a try. So I’m excited to say that the first one is out in this past weekend’s paper (page C4, in the Review section), and also online here.

The column is about Dan Harrington's famous "squeeze play" during the final table of the 2004 World Series of Poker main event. Here's how ESPN covered the hand (you can see in the preview frame that he was making a big bluff with his six-deuce):


There were several things about this hand that I would have mentioned if I had the space. First, a couple of important details for understanding the action:

  • Greg Raymer, the ultimate winner, started the hand with about $7.9 million in chips. Josh Arieh, who finished third, had $3.9 million. Harrington had $2.3 million, the second smallest stack at the table.
  • Seven players remained in the tournament (of the starting field of about 2500) when this hand was played. At a final table like this, the prize payouts escalate substantially with each player eliminated. This might explain why Harrington put in half of his chips, rather than all of them. In case he got raised or called and lost the hand, he would still have a bit left to play with, and could hope to move up the payout chart if other players busted before he did.
  • David Williams, the eventual runner-up, was actually dealt the best hand of anyone: he had ace-queen in the big blind. But facing a raise, a call, and a re-raise in front of him, he chose to fold, quite reasonably assuming that at least one of the players already in the hand would have had him beat, and perhaps badly—e.g., holding ace-king. For reasons of space and simplicity I had to omit Williams from the account in the article. I also omitted the suits of the cards.
  • Dan Harrington is a fascinating character. He excels at chess, backgammon, and finance as well as poker, and he wrote a very popular series of books on hold'em poker with Bill Robertie (himself a chess master and two-time world backgammon champion). He won the World Series of Poker main event in 1995. After his successful squeeze play in 2004 he wound up finishing fourth. He had finished third the year before.
Some people have noted that this could not have been the very first squeeze play bluff ever in poker. And of course it wasn't. But it was, in my opinion, the most influential squeeze play. Because ESPN revealed the players' hole cards, it was verifiably a squeeze play. As I hinted in the article, without the hole card cameras, everyone watching the hand would have assumed that Harrington had a big hand when he raised. Even if Harrington had said later that he had a six-deuce, some people wouldn't have believed him, and no one could have been sure. Once ESPN showed this hand (and Harrington wrote about it in his second Harrington on Hold'em volume), every serious player became aware specifically of the squeeze play strategy, and generally of the value of re-raising "light" before the flop. And because the solid, thinking man's player Dan Harrington did it, they knew it wasn't just the move of a wild man like Stu Ungar, but a key part of a correct, balanced strategy.

Of course, the squeeze play doesn't work every time. It would have failed here if Arieh (or Raymer) really did have big hands themselves. Harrington probably had a "read" that suggested they weren't that strong, but I think this read would have been based much more on his feel for the overall flow of the game—noticing how many pots they were playing, whether they had shown down weak hands before—than on any kind of physical or verbal tell.

Two years after Harrington's squeeze play, Vanessa Selbst was a bit embarrassed on ESPN when she tried to re-squeeze a squeezer she figured she had caught in the act. At a $2000 WSOP preliminary event final table, she open-raised with five-deuce, and drew a call followed by a raise: the exact pattern of the squeeze play. After some thought she went all-in, but the putative squeezer held pocket aces. Selbst was out in 7th place. But she didn't stop playing aggressively, and since then she has become one of the top all-time money winners, and most respected players in poker. Most of the hand is shown in the video below, starting at about the 6:50 mark.


Some readers of the column asked whether I wasn't just describing a plain old bluff, the defining play of poker (at least in the popular mind). The answer is that the squeeze play is a particular kind of bluff—indeed, a kind of "pure bluff," which is a bluff in which your own hand had zero or close to zero chance of actually wining on its merits. (A "semi-bluff," by contrast, is a bluff when you figure to have the worst hand at the time of the bluff, but your hand has a good chance of improving to be the best hand by the time all the cards are out.) What the Harrington hand showed is a particular situation in which a pure bluff is especially likely to work. Pros don't bluff randomly, or when they feel like it, or even when they think they have picked up a physical tell. And they especially don't bluff casually when more than one opponent has already entered the hand. Harrington's bluff was more than just a bluff: It was a demonstration of how elite players exploit their skills to pick just the right spots to bluff and get away with it.

In future columns I’ll talk about different games, hopefully with something interesting to say about each one. The next column should appear in the December 6–7 issue, and will probably concern the world chess championship. My “tryout” could end at any time, of course, but for now my column should be in that same space once per month, at a length of about 450 words. As most of you know, it’s a challenge to say something meaningful in so few words, and for me it’s a challenge just to stay within that word limit while saying anything at all. As in poker, I may need a bit of luck.

By the way, I think it's too bad that the New York Times decided to end their chess column last month. I believe, or at least hope, that there is a market for regular information on games like chess for people who don't pay so much attention via other websites and publications. I remember reading Robert Byrne's version of the Times column in the 1970s. I would get especially excited when my father came home on one of the days the column ran, so that I could grab his newspaper and check it out. Yes, it used to run at least three times per week, then two, then just on Sundays (when Dylan McClain took over with a different, and I think better, approach from Byrne's). Now it doesn't run at all. The Washington Post ended its column as well, but some major newspapers still have one (The Boston Globe and New York Post come to mind).

PS: If you liked the squeeze play column, here are some of my other pieces on games that you can read online, in reverse chronological order:

"The Science of Winning Poker" (WSJ, July 2013)

"Should Poker Be (A Tiny Bit) More Like Chess?" (this blog, August 2013)

"Chess Championship Results Show Powerful Role of Computers" (WSJ, November 2013)

"Bobby Fischer Recalled" (WSJ, March 2009)

"It's Your Move" (WSJ, October 2007)

"How Chess Became the King of Games" (WSJ, November 2006)

"The Other American Game" (WSJ, July 2005)

"A Match for All Seasons" (WSJ, December 2002)

"Checkmate for a Champion" (WSJ, November 2000)