Sneak Peek 7: Peng Wang

The Influence of Parallel Corpus on Machine Translation Quality across Registers

Peng WANG, Ph.D.

University of Maryland                                                                   

Abstract: This study aims to investigate the impact of parallel corpus on Machine Translation (MT) quality when translating source texts in different registers. To this end, we will use English-Chinese parallel corpora such as MultiUN, Ted Talks and OpenSubtitles to train the vertical MT engine and compare the MT results for source texts across registers before and after training-data’s interference. By calculating the post-editing distance, we measure the MT quality.

Our research questions are: (1) will the MT quality vary before and after incorporating the parallel corpora across registers of the source texts; (2) will the increase or decrease in quality help preserve, endanger or develop the diversity of language in the target language;  (3) will the post-editing activities vary before and after incorporating parallel corpora across registers; and (4) will the post-editing activities help preserve, endanger or develop the diversity of language in the target language?

Bio: Peng Wang is a Lecturer and CAT Tool Coordinator of Graduate Studies in Interpreting and Translation program, University of Maryland. She started to do corpus-based translation studies since 2003 at the University of Liverpool and worked in the Corpus Linguistic Program at the Northern Arizona University in 2009. Her research interest is the role of technology and database management in translation, interpretation and intercultural communication.