Prepared by the NCTE Task Force on Writing Assessment
The following annotated bibliography on machine scoring and evaluation of essay-length writing is based on the 2012 published bibliography in the Journal of Writing Assessment 5 (compiled by Richard Haswell, Whitney Donnelly, Vicki Hester, Peggy O’Neill, and Ellen Schendel).
The bibliography was compiled by reviewing recent scholarship on machine scoring of essays, also referred to as automated essay scoring (AES), using databases such as ERIC and CompPile. Entries were selected for their attention to machine scoring of essays and publication in peer-reviewed venues (with exceptions noted). We also endeavored to cover the breadth of the issues addressed in the research without being overly redundant. We avoided publications that were very narrowly focused on highly technical aspects of assessment. The earliest research — such as Ellis Page’s 1966 piece in Phi Delta Kappan, “The Imminence of Essay Grading by Computer” — is not included because many more recent entries provide a review of the early development of machine scoring.
The bibliography is organized by publication date, with the most recent entries appearing first. Entries that have been excerpted from the published JWA bibliography are indicated by an asterisk.
Klobucar, Andrew, Deane, Paul, Elliot, Norbert, Raminie, Chaitanya, Deess, Perry & Rudniy, Alex. (2012). Automated essay scoring and the search for valid writing assessment. In Charles Bazerman et al. (Eds.) International Advances in Writing Research: Cultures, Places, Measures(pp. 103-119). Fort Collins, CO: WAC Clearinghouse & Parlor Press.
This chapter reports on an ETS and New Jersey Institute of Technology research collaboration that used Criterion, an integrated instruction and assessment system that includes automated essay scoring. The purpose of the research was “to explore ways in which automated essay scoring might fit within a larger ecology as one among a family of assessment techniques supporting the development of digitally enhanced literacy” (105). The study used scores from multiple writing measures including the SAT-W, beginning of the semester impromptu essays scored by Criterion, an essay written over an extended time line scored by faculty, end of semester portfolios, and course grades. The researchers compare the scores and conclude that when embedded in a course, AES can be used as “an early warning system for instructors and their students.” Authors also noted concerns that over-reliance on AES could result in a fixation on error and surface features such as length.
Perelman, Les. (2012). Construct validity, length, score, and time in holistically graded writing assessments: The case against automated essay scoring (AES). In Charles Bazerman et al. (Eds.) International Advances in Writing Research: Cultures, Places, Measures (pp. 121-150). Fort Collins, CO: WAC Clearinghouse & Parlor Press.
An accessible critique of the writing tasks (the timed impromptu) and the automated essay scoring process. The author argues that while “the whole enterprise of automated essay scoring claims various kinds of construct validity, the measures it employs substantially fail to represent any reasonable real-world construct of writing ability” (p. 121). He explains how length affects scoring: for short impromptus, length correlates to scores, but once more time is given to write and subjects are known in advance, the influence of length on scores diminishes. He also explains how AES is different from holistic scoring in spite of a single number being generated because that number is generated by a set of analytical measures. These individual measures (e.g., word length, sentence length, grammar, and mechanics) are not the same construct it purports to measure (writing ability). The AES program discussed is primarily the ETS e-rater 2.0 system because ETS has been more transparent about it than other AES developers. Perelman draws on his own research into AES, many ETS technical reports and peer-reviewed research in making his argument. Continue reading