Friday, March 28, 2014

"Data Journalism" on College ROI at FiveThirtyEight: Where's theCritical Thinking?

A website called PayScale recently published a "College ROI Report" that purports to calculate the return on investment (ROI) of earning a Bachelor's degree from each of about 900 American colleges and universities. I found out about this report from an article on Nate Silver's new FiveThirtyEight website. The article appears under a banner called "DataLab," implying that it is an example of the new "data journalism" that Silver and his site are all about. Unfortunately, the article contains approximately zero critical thinking about the meaning of the PayScale report, its data sources, and its conclusions.

PayScale did a lot of number-crunching (read all about it here), but the computation resulted in two key numbers for each institution: (1) the cost of getting an undergraduate degree, taking into account factors like financial aid and time to graduation; and (2) the expected total earnings of a graduate over the next twenty years. The first one can be figured out from public data sources. The second one came from a survey by PayScale (more on this later). The ROI for a college was calculated by subtracting #1 from #2, and then further subtracting the expected total earnings of a person who skipped college and worked for 24–26 years instead (which happens to be about $1.1 million). The table produced by PayScale thus purports to show how much you would get back—in monetary income—on the "investment" of obtaining a degree from any particular college or university.

Indeed, PayScale says that "This measure is useful for high school seniors evaluating their likely financial return from attending and graduating college." But this is simply not true. As I read the FiveThirtyEight article on the PayScale report, I was waiting for them to point out the reasons why, but they never did. The only critical comments were about incorporating the effects of student debt.

What are the problems with the PayScale analysis? First of all, it only makes sense to speak of the comparative return on an investment when the investors have a choice of what to invest in. If every person could choose to attend any college (and to graduate from it and get a full-time job), or to skip college entirely, then it would be meaningful to ask which choice maximizes return. This is what we do when calculating a financial ROI: we try to figure out whether investing in stocks versus bonds, or one mutual fund versus another, or one business opportunity versus another, will be more profitable. But colleges have admissions requirements, so not everyone can go to whatever college he or she wants. Colleges select their students as much as students select their colleges. And in fact, the people who attend different colleges can be very different, and they can be even more different from the people who don't attend college at all.

This means that the Return in this "ROI" depends on much more than the Investment. It also depends on who is doing the investing. In fact, it is far from trivial to figure out the true ROI of going to Harvard versus Vanderbilt versus Wayland Baptist versus Nicholls State versus not attending college at all. To figure this out, you would have to control in the analysis for all the characteristics that make students at different colleges different from one another, and different from students who don't go to college. Factors like cognitive ability, ambition, work habits, parental income and education, where the students grew up and went to high school, what grades they got, and many others are likely to be important. In fact, those other factors could be so important that they might wind up explaining more of the variation in income between people than is explained by going to college—let alone which particular college people go to.

Even controlling for data we might be able to obtain, like the average SAT score and parental income of students who attend each college, would not completely solve the problem, because there could be factors that we can't measure that have an important effect. Only by randomly assigning students to different colleges (or to directly entering the workforce after high school) would we get an estimate of the true ROI (measured in money—which of course leaves aside all the other benefits one might get from college that don't show up in your next twenty years of paychecks).

Of course this ideal experiment won't ever happen, but clever researchers have tried to approximate it by doing things like looking at students who were accepted to both a higher-ranked and a lower-ranked school, and then comparing those who enroll in the higher-ranked one to those who enroll in the lower-ranked one. Since all the students in this analysis got into both schools, the problem of different schools having different students is mitigated. (Not erased entirely, though: for example, people who deliberately attend lower-ranked schools might be doing so because of financial circumstances, or their college experience may differ because they are likely to start out above average in ability and preparation for the school they attend, as compared to those who choose higher-ranked schools.)

FiveThirtyEight said nothing about this fundamental logical problem with the entire PayScale exercise. Nor did it address the other flaws in the analysis and presentation of the data.

It could have also asked about the confidence intervals around the ROI estimates provided by PayScale. When you give only point estimates (exact values that represent just the mean or median of a distribution), and proceed to rank them, you create the appearance of a world where every distinction matters—that the school ranked #1 really has a higher ROI than #2, which is higher than #3, and so on. PayScale's methdology page says, "the 90% confidence interval on the 20 year median pay is ±5%" (but 10% for "elite schools" and "small liberal arts schools or schools where a majority of undergraduates complete a graduate degree"). The narrowness of these intervals is a bit hard to believe, as well as their uniformity (how does every school in a category get the same confidence interval?). Why not just put the school-specific confidence intervals into the report, so that it is obvious that, for example, school #48 (Yale) is probably not significantly higher in ROI than, say, school #69 (Lehigh), but is probably lower in ROI than school #6 (Georgia Tech)?

It's hard to have much confidence in these confidence intervals anyhow, since we don't know how many people PayScale surveys at each college to make the income calculations (which will be the critical drivers of the variability in ROI). Many of the colleges are small; how reliable can the estimates of what their graduates will earn be? And are the surveys of college graduates unbiased with respect to what field the graduates work in? Or, for example, do engineers and teachers tend to respond to these surveys more than, say, baristas and consultants? The unemployed and under-employed are not included; this will have the effect of inflating the apparent ROI of schools whose graduates tend, for whatever reasons, not to have full-time jobs. Payscale says that non-cash compensation and investment income are not included, which might bias down the reported ROI of graduates of elite schools who go into financial careers.

Finally, perhaps FiveThirtyEight could have looked at whether the schools that stand out at either end of the distribution happen to be smaller than the ones in the middle. Ohio State, Florida State, et al. have so many students, drawn from such a broad distribution of ability and other personal traits, that they should be expected to have "ROI" values nearer to the middle of the overall distribution of universities than should small colleges, which through pure chance (having, by luck, more high- or low-income graduates) are more likely to land in the top or bottom thirds of the list. Some degree of mean reversion may be expected, so the rankings of PayScale will lose some predictive value for future ROIs, especially in the case of small schools.

The comments I have made all concern the underlying PayScale report, but I think it is FiveThirtyEight that has not upheld the best standards of "data journalism." If that term is to have any meaning, it can't simply refer to "journalism" that consists of the passing along of other people's flawed "data" (especially when those people are producing and promoting the data for commercial purposes). Nate Silver earned his reputation, and that of his FiveThirtyEight brand, largely by calling out—and improving on—just this kind of simplistic and misleading analysis. It's sad to see his "data journalism organization" no longer criticizing superficiality, but instead promoting it.

Postscripts: 3/29/14: After I first posted this piece, I realized three things. First, I hadn't mentioned mean reversion originally, so I added it in. But it's a minor issue compared to the others. Second, I didn't make it clear that notwithstanding what I wrote above, I am 100% in favor of more good data journalism. I agree with Nate Silver and others that journalists (and everyone!) should be more aware of the data that exists to answer questions, how to gather data that has not already been compiled, how to think about data, and so on. A great example of silly data-ignorant journalism is the series of articles the New York Post has been running on the "epidemic" of suicides and suspicious deaths in the financial industry. The proper question to start with is whether there is an epidemic, or even a significant excess over normal variation, as opposed to  a set of coincidences that would be expected to happen every so often. Perhaps there is an epidemic, but I am skeptical. The Post (and other outlets that have reported on these deaths) skip right over this crucial threshold issue. Maybe FiveThirtyEight could address it and teach its readers about the danger of jumping to conclusions after seeing nonexistent patterns in noise. Third, and finally, I should have mentioned that FiveThirtyEight has on board some people who really do know how to think seriously about data (and do it much better than I do), such as the economist Emily Oster. I hope Emily's influence will spread throughout the organization. 3/30/14: I removed text in the original version that asked whether outliers like hedge fund managers had their incomes included in PayScale's calculations. They won't have too much influence, regardless, because PayScale is reporting medians, not means. My apologies for the inadvertent error. 4/5/14: I changed the number of colleges included from 1310 to "about 900." There are 1310 entries in Payscale's table, but many colleges are listed more than once if they have different tuition options (e.g. state resident versus non-resident). 4/7/14: I added links to the Krueger & Dale (and Dale & Krueger) economics papers that tried to estimate the returns from attending more selective/elite colleges. I knew about these papers when I wrote the initial post, but had forgotten who the authors were.

1 comment:

    Work from home theory is fast gaining popularity because of the freedom and flexibility that comes with it. Since one is not bound by fixed working hours, they can schedule their work at the time when they feel most productive and convenient to them. Women & Men benefit a lot from this concept of work since they can balance their home and work perfectly. People mostly find that in this situation, their productivity is higher and stress levels lower. Those who like isolation and a tranquil work environment also tend to prefer this way of working. Today, with the kind of communication networks available, millions of people worldwide are considering this option.

    Women & Men who want to be independent but cannot afford to leave their responsibilities at home aside will benefit a lot from this concept of work. It makes it easier to maintain a healthy balance between home and work. The family doesn't get neglected and you can get your work done too. You can thus effectively juggle home responsibilities with your career. Working from home is definitely a viable option but it also needs a lot of hard work and discipline. You have to make a time schedule for yourself and stick to it. There will be a time frame of course for any job you take up and you have to fulfill that project within that time frame.

    There are many things that can be done working from home. A few of them is listed below that will give you a general idea about the benefits of this concept.

    This is the most common and highly preferred job that Women & Men like doing. Since in today's competitive world both the parents have to work they need a secure place to leave behind their children who will take care of them and parents can also relax without being worried all the time. In this job you don't require any degree or qualifications. You only have to know how to take care of children. Parents are happy to pay handsome salary and you can also earn a lot without putting too much of an effort.

    For those who have a garden or an open space at your disposal and are also interested in gardening can go for this method of earning money. If given proper time and efforts nursery business can flourish very well and you will earn handsomely. But just as all jobs establishing it will be a bit difficult but the end results are outstanding.

    Freelance can be in different wings. Either you can be a freelance reporter or a freelance photographer. You can also do designing or be in the advertising field doing project on your own. Being independent and working independently will depend on your field of work and the availability of its worth in the market. If you like doing jewellery designing you can do that at home totally independently. You can also work on freelancing as a marketing executive working from home. Wanna know more, email us on and we will send you information on how you can actually work as a marketing freelancer.

    Internet related work
    This is a very vast field and here sky is the limit. All you need is a computer and Internet facility. Whatever field you are into work at home is perfect match in the software field. You can match your time according to your convenience and complete whatever projects you get. To learn more about how to work from home, contact us today on workfromhome.otr@gmail.comand our team will get you started on some excellent work from home projects.

    Diet food
    Since now a days Women & Men are more conscious of the food that they eat hence they prefer to have homemade low cal food and if you can start supplying low cal food to various offices then it will be a very good source of income and not too much of efforts. You can hire a few ladies who will help you out and this can be a good business.

    Thus think over this concept and go ahead.