Mapping the linguistic topography of Sophocles’ plays: what Natural Language Processing can teach us about Sophoclean drama
Benjamin Folit-Weinberg, A.G. Leventis Postdoctoral Research Fellow (Institute for Greece, Rome, and the Classical Tradition & Department of Classics & Ancient History, University of Bristol) and Justus Schollmeyer, Data Scientist & Programmer
Scholars have long recognized that Sophocles, the great 5th Century B.C.E. tragedian, repeats thematically important words in his plays and that studying these repetitions can offer fundamental insights into his work. At present, however, identifying these repetitions is time-consuming and unsystematic, and the significance of specific repetitions is not always clear. Our project applies Natural Language Processing (NLP) and data visualization techniques to help scholars of Sophocles both identify linguistic patterns more efficiently and rigorously and interpret the significance of these patterns more insightfully.
Seed Corn funding provided by the Jean Golding Institute allowed us to create a feasibility prototype for an NLP and data visualization tool with several functions. The first function is heuristic and identifies the words or word families that appear most frequently in each of the seven fully extant plays of Sophocles. The second function is analytical and calculates how frequently a given word or word family is used in a specific play by Sophocles compared to the remaining six plays. The third function is hermeneutic and depicts the distribution of selected words within a specific play (see diagram below); the chart will ultimately include various overlays that demarcate units of the play and articulate relationships between uses of key words.
The successful development of this feasibility prototype has enabled us to apply for further funding to develop our tool; our goal is to make this available as a common good to anyone with an internet connection, regardless of their institutional affiliation or programming literacy. We are also exploring the possibility of scaling up our tool to address the entire 5th Century Athenian dramatic corpus and other corpora of texts from Greco-Roman antiquity.
For further information, please contact firstname.lastname@example.org