From n-grams to n-sets: A Fuzzy-Logic-Based Approach to Shakespearian Authorship Attribution.
This thesis surveys the principles of Fuzzy Logic as they have been applied in the last three decades in the micro-electronic field and, in the context of resolving problems of authorship verification and attribution shows how these principles can assist with the detection of stylistic similarities or dissimilarities of an anonymous, disputed play to an author’s general or patterns-based known style. The main stylistic markers are the counts of semantic sets of 100 individual words-tokens and an index of counts of these words’ frequencies (a cosine index), as found in the first extract of approximately 10,000 words of each of 27 well attributed Shakespearian plays. Based on these markers, their geometrical representation, fuzzy modelling and on thee ground of Set Theory and Boolean Algebra, in the core part of this thesis three Mamdani (Type-1) genre-based Fuzzy Expert Systems were built for the detection of degrees (measured on a scale from 0 to 1) of Shakespearianness of disputed and, probably, co-authored plays of the early modern English period. Each of these three expert systems is composed of seven input and two output variables that are associated through a set of approximately 30 to 40 rules. There is a detailed description of the properties of the three expert systems’ inference mechanisms and the various experimentation phases. There is also an indicative graphical analysis of the phases of the experimentation and a thorough explanation of terms, such as partial truths membership, approximate reasoning and output centroids on an X-axis of a two-dimensional space. Throughout the thesis there is an extensive demonstration of various Fuzzy Logic techniques, including Sugeno-ANFIS (adaptive neuro-fuzzy inference system), with which the style of Shakespeare can be modelled in order to compare it with well attributed plays of other authors or plays that are not included in the strict Shakespearian canon of the selected 27 well-attributed, sole authored plays. In addition, other relevant issues of stylometric concern are discussed, such as the investigation and classification of known ‘problem’ and disputed plays through holistic classifiers (irrespective of genre). The results of the experimentation advocate the use of this novel, automated and computer simulation-based method of classification in the stylometric field for various purposes. In fact, the three models have succeeded in detecting the low Shakespearianness of non Shakespearian plays and the results they provided for anonymous, disputed plays are in conformance with the general evidence of historical scholarship. Therefore, the original contribution of this thesis is to define fully functional automated fuzzy classifiers of Shakespearianness. The result of this discovery is that we now know that the principles of fuzzy modelling can be applied for the creation of Fuzzy Expert Stylistic Classifiers and the concomitant detection of degrees of similarity of a play under scrutiny with the general or patterns-based known style of a specific author (in our case, Shakespeare). Furthermore, this thesis shows that, given certain premises, counts of words’ frequencies and counts of semantic sets of words can be employed satisfactorily for stylistic discrimination.
- PhD