Some problems in using numbers to represent the writing styles of Shakespeare and his contemporaries
Date
Authors
Advisors
Journal Title
Journal ISSN
ISSN
DOI
Volume Title
Publisher
Type
Peer reviewed
Abstract
The quantitative study of writing styles—sometimes called stylometry or computational stylistics—has in the past two decades been enhanced by the widespread availability of large digital textual corpora and easy-to-use software tools that lower the technical obstacles for participation in this field. For the study of early modern drama, the availability of the raw text datasets called ProQuest One Literature (formerly Literature Online (LION)) and Early English Books Online (EEBO) makes it easy to compare Shakespeare’s writing with that of his contemporaries. The result has been a boom in quantitative studies of early modern drama. Certain aspects of language, such as authorial preferences for particular words and phrases, are especially easy to quantify. But there are problems attendant on the quantitative analysis of language that are easily overlooked because language is a more complex subject than it first appears. This essay surveys four kinds of problems that can distort our perspective when we start using numbers to represent writing styles.