Big Data versus the Faculty (and Close Reading)

A colleague of mine went on a public tirade this fall against the use of numbers of citations in decisions on tenure and promotion. Hers may have been an intentionally self-serving rant, but she has a point. The amount of attention a paper or book brings—or even its usefulness to other scholars—has little to do with its ultimate value within a field. After all, attention comes from a number of factors beyond the intrinsic value of the work, including reputation and venue. It’s safer to cite Franco Moretti (the creator of ‘distant reading’) than a graduate student making a similar point but in an obscure journal. And an article that lends itself to citation is not necessarily the one that will be read down the line. It may be constructed more for citation than for contribution, dipping into current fads for immediate attention.

This fundamental truth comes into focus right now because of new reliance on ‘big data’ in employment decisions. As Martin Kich, referencing David Hughes of Rutgers, writes, “Use of a proprietary database that purports to show the publications, citations, books and grants awarded to a professor provides far too limited a perspective on faculty achievement.” It also leaves too much room for error—but let’s leave that aside and concentrate on the first point. This is a limited tool at best, and of limited utility.

It has long bothered me that hiring, tenure and promotion committees often rely on numbers and venues instead of on the work of the candidates itself. Few people actually read the scholarship of those they are judging. They simply read about it. In this sense, the move to ‘big data’ is nothing more than an extension of what too many of us have been doing for years, exacerbating the problems that arise from our inability to judge scholars on their work alone—for whatever reason (sometimes it’s pressures of time, sometimes the number of candidates is simply overwhelming and sometimes it’s simply laziness).

The rise of ‘close reading’ and then Moretti’s ‘distant reading’ provides a useful corollary to what is going on now with the ‘big data’ fascination. Close reading came about, in part, in response to a politicized historicism in literary studies that threatened to reduce the importance of the text itself. It provided a tool for textual analysis that insists on concentration on the words themselves. Sometimes, admittedly, its proponents went too far, privileging text over context and deriding older views of the author as seeing that person as the ‘oracle’ for the text. I remember Stephen King—this was 30 years ago—reflecting this attitude, saying he had no more primacy in speaking of his books than he did about his fingernail clippings. That they came from his own hand had no significance to subsequent examination.

Distant reading arose from new ease through digital means of utilizing concordances and other aggregators of data about texts and words. Another tool, it fits well with close reading, for each has a distinct function and even philosophy about textual studies. Distant reading tells one little about the pleasures or intricacies of individual texts, as close reading does. Instead, it points out patterns and trends. Distant reading may have arisen, partially, in response to an over-reliance on close reading, but it is not in competition with it. Its uses are distinct.

In both close reading and distant reading, however, the author disappears—or is reduced in significance. Neither method, therefore, is sufficient on its own for bringing deep understanding to specific texts (they are all, after all, author connected) though both can add to the process toward such understanding. Use of them does not replace the standard reader experience, a dynamic that includes text, author, context of composition, reader and context of consumption. When we look only at the details within ‘the four corners of the page,’ we miss the text as an act of communication. When we look only at the text within an aggregate or against some sort of numerical standard, we also ignore the attempted communication.

The problem with over-reliance on any tool, then, is apparent in usage of both close reading and distant reading. Neither should be the sole basis for decision-making; both are valuable, but only within a range of other considerations—and with full access to the information generated so that it can be competently assessed. In both cases, however, the totality of the research behind the data is available to any interested party.

The Rutgers University contract for ‘big data’ on faculty scholarship with Academic Analytics denies faculty access to that data, further limiting its usefulness. This is a problem for it is faculty who are responsible for hiring, retention, tenure and promotion. Just providing them numbers tells them little about the candidates—just as distant reading can never tell us much about individual books. The information, if accurate and appropriate (there is reason to believe that the data coming from Academic Analytics is not always either) can be useful, but in no way can it be the basis of judgment unless there is full access to the information and complete description of the means of its generation.

Another trouble with the Rutgers deal with Academic Analytics is that it costs the university over $100,000 a year. Having spent the money, the university is probably loathe not to give its data primacy. The same can be true of advocates of close reading and distant reading. When they have invested a great deal in either, scholars tend to elevate the one or the other. Thing is, no tool is intrinsically better than another; a saw does not have primacy over a hammer. Unfortunately, when Rutgers buys an expensive hammer, the university wants it used, whether it is a well-designed and constructed hammer and whether or not it is the appropriate tool in the first place. This means that other means of decision-making will be reduced in stature—and also that the administration, which bought the service, will likely use its data as a primary tool, even when not appropriate.

Close reading of a candidate’s file is not enough, nor is reliance on the distant reading of ‘big data.’ Like teaching itself, personnel decisions cannot be reduced to either texts or numbers. They concern people and their interactions—and the contexts of their activities.

Rutgers, though, may not be wasting its money. The subtext of its decision to rely on the data from Academic Analytics may be that shared governance and its implications are no longer to be relied upon, that hiring and firing should be determined by data alone—and this is data controlled and accessed by administrators, not faculty. To Rutgers administrators, that may seem just fine. To me, it’s scary.