header_visibility: footer_visibility:
lang: en

The Robots Are Coming! (To Grade Your Essay)

Calendar May 27, 2014 | User Jon Radebaugh

As a consultant in education systems and software, I’ve noticed that a majority of new applications in this burgeoning field, ($500 million in startup seed money in the first quarter of 2014 alone!)[1] focus on STEM. STEM stands for “Science, Technology, Engineering and Math.” This is perhaps because these applications are created by software engineers who have a built-in bias toward those subjects along with beer pong and foosball. Another explanation is that building software for the humanities is actually really hard, perhaps one of the hardest challenges in computer science.

Despite this neglect, and the hype around STEM, most educators as well as thoughtful engineers and scientists agree that reading and writing are essential education for a future that involves creativity and collaboration in tackling the big challenges that we face. Massachusetts Institute of Technology (MIT) for one seems to recognize this, “…the world’s problems are never tidily confined to the laboratory or spreadsheet,” says Deborah K. Fitzgerald, MIT professor.[2]

It is the untidiness of the humanities that proves difficult to program. The challenge of programming the humanities is nowhere more evident than on the topic of essay graders, software designed to automatically assess and grade essay question responses. Another MIT professor, Les Perelman, who has campaigned for years against essay graders,[3] recently worked with students to create a software application called the “Basic Automatic B.S. Essay Language Generator” or “Babel Generator” for short. Babel can generate gibberish essays that will successfully fool the most sophisticated essay graders, a sort of reverse Turing test.[4]

In a surreal life imitates art moment, Ray Kurzweil, the science fiction author and futurist, was hired as a VP of Engineering at Google.[5] Kurzweil, in addition to forays into predicting the future, founded an education software company[6] and is an expert in machine learning and artificial intelligence.  His mission in his new position is nothing less than to “bring natural language understanding to Google.”[7] Whether such a machine will achieve consciousness as Kurzweil predicts,[8] we may at last get a truly effective essay grader.

Don’t be too quick to throw out our essay-grading robot babies with the bathwater. Most in the field seem to agree that the essay graders available today used in the right context (i.e. not high stakes), or as an additional tool for human graders, can be useful. If you are a student in a free massive, open online course (MOOC), perhaps a free grade of your essay answer by a robot is better than nothing. What’s more, short of high-stakes essay grading intelligent machines, there is doubtless a whole host of useful software applications yet to be developed to assist teachers and students in the humanities.

Some of the most sophisticated essay graders currently in use today, like Vantage Learning’s IntelliMetric that’s used to score the GMAT and MCAT exams,[9] rely on well-defined “rubrics” to guide the machine. Rubrics define in detail the goals of the test question. The richer the rubric, the better results you will get from a grader, human or machine. In order for such a process to be fair, a rubric must be well understood by both teacher and student and widely shared. Rubrics represent a common understanding, the “semantics,”[10] of what we wish to test. Short of the singularity, we must get to work defining and applying those semantics. One can imagine highly useful evolutionary software designed to assist a teacher in applying a sophisticated, shared writing rubric, and the student in understanding its application.

In concert with the rapid investment in learning software, we are seeing investment by the learning community in defining standards to enable interoperability of applications,[11] analytics[12] and discovery of learning resources.[13] IMS Global’s XML standard for “Question and Test Interoperability” (QTI) provides a standard for defining rubrics, while standards like the Common Core provide very specific, well-defined rubrics for different styles of writing.[14] It will be fascinating to see these standards and others emerge as learning software designed to teach the humanities. What I find almost as interesting as the heady world of artificial intelligence, is the metadata scaffolding beneath the magic of today. At Flatirons, we’ve always specialized in the scaffolding that makes the magic possible.

Flatirons is active in the learning assessment space and we have specific research going on in several areas. In subsequent blogs I’ll discuss our work in closed-loop assessments, analytics in assessments and the real world use of the QTI standard.

[13] See the Learning Resource Metadata Initiative (LRMI) recently accepted by


©2021 Flatirons Solutions, Inc.|All Rights Reserved|Privacy Policy
Follow Flatirons Solutions