Markup

TEI markup

In 2013, a TEI P5 markup schema for emigrant letters, produced by Peter Stadler and his team at the Universität Paderborn, was uploaded onto GitHub. You can access this and other related files from: https://github.com/peterstadler/Emigrant-Letters.

Building on work carried out for the DEM project, Emma Moreton developed TEI templates for 'Document/Text' (capturing information relating to the transcription history and provenance of a letter), 'Person' (capturing sociobiographic information relating to the author/recipient) and 'Place' (capturing information relating to the locations of the participants involved in the act of communication). These templates provide just one possible way of organising information about the emigrant letter, offering a possible starting point for interconnecting resources and interdisciplinary research. There is still a lot more to do! The templates can be accessed from: 

Moreton, E. (2016) Ph.D. thesis: The emigrant letter digitised: markup and analysis. Birmingham: University of Birmingham. Available from: http://etheses.bham.ac.uk/6416/

The speech act tagger

The speech act tagger is a natural language processing (NLP) tool which is designed to recognise speech acts typical of email communication including direct and indirect requests, commitments, questions, statements, and expressions of feeling. It uses a combination of resources including a parser, a part-of-speech (POS) tagger, and a vocabulary list to extract syntactic and lexical information about each utterance; on the basis of this information, a speech act category is assigned to the utterance. The current version of the tagger, trained on annotated emails from the Enron dataset, achieves around 75% accuracy (precision 74.5%, recall 68%), but as more annotated data becomes available for training and testing, these figures are expected to improve.

In 2014 the speech act tagger was trialled with some of the emigrant letter collections; the results were very promising. We plan to publish our findings in 2016/2017 - full details will be posted on the home page in due course. For more information about the speech act tagger please contact Dr Rachele De Felice, English Dept., University College London (email: r.defelice@ucl.ac.uk).

Related references:

Rachele De Felice, Jeannique Darby, Anthony Fisher, and David Peplow (2013). A classification scheme for annotating speech acts in a business email corpus. ICAME Journal 37, pp.71-105.


Rachele De Felice and Paul Deane (2012). Identifying speech acts in e-mails: Toward automated scoring of the TOEIC(R) e-mail task (ETS Research Report No. RR-12-16). Princeton, NJ: ETS [http://www.ets.org/Media/Research/pdf/RR-12-16.pdf]