Computational authorship studies are an increasingly popular topic for research among specialists per both cervello elettronico science and the humanities

Computational authorship studies are an increasingly popular topic for research among specialists per both cervello elettronico science and the humanities

It can be considered a form of style-based document authentication (Echtheitskritik), which has valuable applications that extend well beyond the domain of literary analysis, esatto, for instance, the domain of forensic sciences. According esatto Stamatatos’s 2009 survey of the field, ‘[t]he main pensiero behind statistically or computationally-supported authorship attribution is that by measuring some textual features we can distinguish between texts written by different authors.’22 22 Addirittura. Stamatatos, ‘Per survey’ (n. 14, above) 538. This basic assumption implies that it should be possible puro assess, for any new unseen document, whether or not it was written by other authors for whom we have texts available. Nowadays computational authorship studies are often considered a subfield of stylometry durante the digital humanities, the broader computational study of the writing style of texts.23 23 D. Holmes, ‘The evolution of stylometry sopra humanities scholarship’, LLC 13 (1998) 111–17.

While stylometry has verso rich history, dating back esatto at least the nineteenth century, it is clear that it received its most important impetus only durante the past two or three decades, stimulated by the rise of (personal) computing and the increased availability of large bodies of text sopra electronic form. Apart from the influential, yet more conventional, statistical analyses carried out by pioneers such as Mosteller and Wallace or John Burrows well before the 1990s, an influential approach in authorship studies has been puro approach the attribution of anonymous texts as a ‘text categorization’ problem.24 24 Mosteller and Wallace, Inference and disputed authorship (n. 4, above) and J. Burrows, Computation into criticism: verso study of Jane Austen’s novels (Oxford 1987). Heavily influenced by parallel research durante computer science, the timore was sicuro optimize verso statistical classifier on example texts by verso number of available candidate authors, much like per spam filter nowadays is still trained on manually annotated emails esatto learn how preciso distinguish between ‘junk’ email and normal messages.25 25 F. Sebastiani, ‘Machine learning in automated text categorisation’, ACM Elaboratore Surveys 34 (2002) 1–47. After training such verso classifier on this example tempo, the classifier could then be used esatto categorize or classify anonymous text as belonging puro one of the allenamento authors’ oeuvres.

It resembles verso police lineup, per which the correct author of an anonymous text has to be singled out from a series of available candidate authors for whom reference or ‘training’ material is available

This text categorization setup is commonly known as ‘authorship attribution’.26 26 The following paragraph heavily draws on M. Koppel and Y. Winter, ‘Determining if two documents are written by the same author’, JASIST 65 (2014) 178–187. For per number of years, practitioners of stylometry have che onesto acknowledge the limitations of authorship attribution, because it necessarily assumes that the correct target author is indeed included mediante the batteria of candidates. In many real-world cases, this problematic assumption cannot possibly be made, because the set of relevant candidates is difficult or impossible puro establish beforehand. Because of this, the setup of authorship verification has recently been introduced as per new framework: here, the task is esatto verify whether or not an anonymous document was written by one or several of per series of candidate authors. In some sense, authorship verification redefines the text categorization problem by adding an additional category label: ‘None of the above.’

Con the present context, it should be emphasized that the problem posed by the HA is verso ‘vanilla’ example of a problem in authorship verification: while the raccolta indeed contains per number of (auto-) attributions, the veracity of all of these has been questioned sopra previous scholarship

Verification is hence an increasingly common experimental setup con authorship studies, and is the topic of verso dedicated track in the yearly PAN competition, an annual competition on finding computational solutions puro issues durante present-day textual forensics, mostly related sicuro the detection of plagiarism, authorship, and agreable programma misuse (such as grooming or Wikipedia vandalism).27 27 The competition’s website is The most recent survey of an authorship verification track is: Ed. Stamatatos et al., ‘Overview of the author identification task at PAN 2015′ per Working Taccuino Papers of the CLEF 2015 Evaluation Labs, addirittura. L. Cappellato et al. (2015). Generally speaking, authorship verification is per more generic problem than authorship attribution – i.ed. every attribution problem could, sopra principle, be cast as per verification problem – but it has also proven sicuro be more challenging. Con our experiments, we have therefore attempted to radically minimize any assumptions on our part as sicuro the authorial provenance of the texts mediante the HA. For each piece of text analysed below, we propose puro independently assess the probability that it was written by one of the (alleged) individual authors identified con the corpo.

Geef een reactie

Het e-mailadres wordt niet gepubliceerd. Vereiste velden zijn gemarkeerd met *