Logo for CIT 2020 Conference - Transforming Interpreter Education

Development of an English-ASL Corpus for Interpreter Training

by Rafael Treviño, Julie Hochgesang & Emily Shaw


Date | Time | Room

Return to 2020 Conference Schedule

In this session, we will present on the development of an innovative use of technology that provides students, practitioners, and researchers with access to a preliminary data set of interpreted ASL. The data set is part of the Gallaudet University Documentation of ASL (GUDA) project (https://shwca.se/GUDA) and, as of the fall of 2019, consists of 585 videos from the Gallaudet media library. The session is directed at educators; however, we will demonstrate the utility of the technology for practitioners and researchers, as well.

Among spoken languages, the benefits of using corpora for research purposes have long been recognized for both translation (Baker, 1993) and interpreting (Shlesinger, 1998). In Sign Language Interpreting Studies, some of the major theoretical works in our field have been based on what could be referred to as small, ad hoc corpora or data sets. For example, Cokely (1992) used a small, specialized data set of recorded interpretations, which laid the foundation for his Sociolinguistic Model of interpreting. Roy (2000) also used a recorded interaction to view dialogue interpreting through a discourse analysis lens. Roy’s data was later re-used by Marks (2012) for her work on footing and Llewellyn-Jones and Lee (2014) for their work role-space. Although Roy’s (2000) work consisted of a single interaction, this re-use of data by other researchers shows the benefits of creating accessible corpora (see also Joint Declaration of Data Citation Principles, Data Citation Synthesis Group, 2014).

The benefits of using corpora in translator education has also been recognized, although ideas about their use in teaching sign language interpreting has been limited (Frishberg, 2010). In our presentation, we will describe ways in which corpora can be used in sign language interpreter education. Examples include investigating the effects of settings and stylistic preferences on interpretations; creating interpreting activities based on real examples; analyzing the strategies adopted by professionals, and at the most basic level, looking up terminology (Zanettin, Bernardini, & Stewart, 2003). With regard to looking up terminology, while English-ASL dictionaries exist, even for specialized terms (e.g., ASL Clear, https://clear.aslstem.com), a corpus allows users to see terminology used in context.

We will also present our methodology for creating the GUDA data set of interpreted ASL. The corpus was created by writing scripts to download the videos and their respective SRT (SubRip subtitle) files. Afterward, scripts were written to generate EAFs (annotation files) and import the SRT files to make the data set searchable in ELAN (EUDICO Linguistic Annotator). The data set includes examples of ASL with English captions, ASL interpreted into English, and English interpreted into ASL. Currently, the data set is saved on an external hard drive.

Finally, we will report on the challenges faced in creating the corpus and other important considerations—such as obtaining permissions—and the future directions of GUDA, which include seeking funding to make the corpus available to a wider audience. The implications of this work can be summarized as taking a step in the direction of applied corpus-based sign language interpreting studies.


Baker, M. (1993). Corpus linguistics and translation studies: Implications and applications. In M. Baker, G. Francis, & E. Tognini-Bonelli (Eds.), Text and technology: In honor of John Sinclair (pp. 233–252). John Benjamins.

Cokely, D. (1992). Interpretation: A sociolinguistic model. Linstok Press.

Data Citation Synthesis Group. (2014). Joint declaration of data citation principles. FORCE11. https://doi.org/10.25490/a97f-egyk

Frishberg, N. (2010, December 3). Repurposing corpus materials for interpreter education [Workshop presentation]. Sign Linguistics Corpora Network, Radboud University. https://www.ru.nl/publish/pages/607111/slcn4_frishberg.pdf 

Llewellyn-Jones, P. & Lee, R. G. (2014). Redefining the role of the community interpreter: The concept of role-space. SLI Press.

Marks, A. R. (2012). Participation framework and footing shifts in an interpreted academic meeting. Journal of Interpretation, 22(1).

Roy, C. B. (2000). Interpreting as a discourse process. Oxford University Press.

Shlesinger, M. (1998). Corpus-based interpreting studies as an offshoot of corpus-based translation studies. Meta XLIII(4). 

Zanettin, F., Bernardini, S., & Stewart, D. (Eds.). (2003). Corpora in Translator Education. Routledge.

Participants will be able to:

  • describe the background on the use of corpora in sign language interpreting,
  • explain the application of corpora in teaching sign language interpreters,
  • describe the methodology used to construct our corpus of interpreted ASL,
  • identify the challenges to consider when creating a corpus, and
  • recognize the implications of using corpora in our field.

A Latinx man with short black hair with grey highlights wearing a blue shirt looks at the camera in front of a light grey backgroundRafael O. Treviño is a second-year doctoral student in the Ph.D. in Interpretation program at Gallaudet University. He is conducting an internship under Dr. Julie A. Hochgesang and Dr. Emily P. Shaw on the Gallaudet University Documentation of ASL (GUDA) project, which aims to create a monitor corpus of American Sign Language. Rafael has conducted research on bimodal-multilingual (ASL-Spanish-English) interpreting and has provided training to interpreters in the U.S. and Mexico, in both academic and community settings. He holds NIC Advanced certification (ASL-English) from RID, BEI Trilingual Master (ASL-Spanish-English) certification from Texas, and ATA (American Translators Association) certification (Spanish into English). Twitter @rafaelotrevino

A white woman with straight brown hair down past shoulders wearing a sleeveless shirt with a tattoo on left arm smiles at the cameraJulie A. Hochgesang is an associate professor of linguistics at Gallaudet University. Her research specializations: language documentation and corpus linguistics of signed languages, including the textual representation of ASL (e.g., ASL Signbank); phonetics and phonology of signed languages; making signed language research accessible to the communities. She self-identifies as a Deaf American woman and uses signed ASL and written English as her primary languages. Twitter @jahochcam @lingdeptgu

Close up of a white woman with brown wavy ear-length hair wearing a purple shirt in front of red, yellow and green leavesEmily P. Shaw is an associate professor in the Department of Interpretation and Translation at Gallaudet University. While working as an interpreter in private practice, she received her Ph.D. in Linguistics from Georgetown University. In her dissertation Gesture in Multiparty Interaction: A Study of Embodied Discourse in Spoken English and American Sign Language, she employs an interactional sociolinguistic approach to analyze two multiparty discourses. The study pushes further the notion that language, both spoken and signed, is fundamentally embodied. She also has worked extensively on tracing the origin of signs in ASL to French Sign Language. This work culminated in the text A Historical and Etymological Dictionary of American Sign Language co-authored with Yves Delaporte, in which they detail the etymologies of over 500 ASL signs.