DEC Activities and Works

Programme based

Postgraduate

Research Degrees

Research Projects

Treebank Visualization Tool

A Treebank is a database of sentences annotated with syntactic information showing how different elements in a sentence relate to one another. An online treebank visualisation tool was created to enable users without linguistics or computing background to understand texts in a graphical way and find patterns on word usage which might not be identifiable through normal search engines.

Award Student Supervisor
CLASS DEC Competition 2014-15
Postgraduate Champion (Individual)

YEUNG, Chak Yan

Dr LEE, John

Publications

Crowdsourcing translation in contemporary China: Theories and Practices

Abstract: Nowadays, we cannot ignore the huge contribution made by the grass-root users of the internet to the circulation of cultural products and the translation industry. Web page news is updating to the last second with their translation in diversified languages appearing on user-generated pages in different countries. Fansubbing groups and manga scanlation fanatics are uploading the newest version of their favorite films and manga with their own translations 24-hours within the works having been officially launched, and are, at the same time, circumventing censorships adroitly. Comparing with the long-established translation industry, where each formally published translation work has to undergo certain strict procedures to be on the market, the translations generated by the internet users reach their audiences in almost no time. In addition, the role of the translator is no longer limited to a small group of certificated professionals, but can be played by every internet user who is able to translate.

Under such a light, this chapter explores the development of crowdsourcing subtitle translation activities in China during the last decade, where the internet has been popularized in cities throughout the country, the grass-root users were able to produce and consume texts much more conveniently than before, and the official internet media regulations have developed gradually from loose to tight. By probing into the characters and cultural implications of Chinese crowdsourcing translation communities, their development pattern, and their clash and converse with the official discourse, a better understanding of how the crowdsourcing pattern can be used in the future production of translation in China can be achieved.

Published in:

The Human Factor in Machine Translation, (2018), (pp. 236), S. CHAN (Ed.), London: Routledge.

External link:

https://books.google.com.hk/books?id=aUlaDwAAQBAJ&pg=PT273&...

Methodological Issues of Subtitle Translation Studies in the Internet Age—Achievements, Reflections, and Prospects

网络时代字幕翻译研究方法:现状、反思与展望

Abstract: Subtitle translation for the online videos has become an increasingly popular topic in the recent few years, and in China, a vast number of researches have been done. However, after a general review of the existing studies, it is discovered that they share some common problems: the methodologies are lagging behind time, not targeted enough at the topic, and the researchers failed to apply them properly to the contexts. To tackle these problems, a further review of audiovisual translation (AVT for short) studies abroad is conducted, trying to figure out why and how the various existing AVT methodologies have developed and been applied since the mid-twentieth century. Through its development, AVT studies have been making continuous breakthroughs along the following two lines: one focusing on expanding the research scope and adding in new text-types; the other endeavoring to stretch the methodological blueprint by borrowing from other disciplines. The conclusion is that, a methodology shall be proper only when it fully considers the social-cultural contexts, the text-types, as well as the media carrying the information. In such a way, our research can become more valuable, systematic and scientific.

Published in:

Shanghai Journal of Translators, (2017), (5), 27-31.
(上海翻译,(2017), (5), 27-31)

External link:

http://cnki.cn-ki.net/KCMS/detail/detail.aspx?...

Translation Crowdsourcing and the Transmutation of Translator’s Subjectivity

网络众包翻译模式与译者主体性的嬗变

Abstract: Crowdsourcing translation emerged with the internet era, and it distinguishes itself from the established translation industry. The subjectivity of the internet crowdsourcing translators is fundamentally different from their off-line, traditional counterparts, because the nature of crowdsourcing enables the crowdsourcers to liberate translation from the hands of the few “elite translators”. Thus, translation becomes an open and shared social resource. Crowdsourcing translation relies largely on the power of the crowd to discover the best way of translating, as well as to reveal the meaning of the original. Meaning, during the process of translation, flows through different individual translating subjects and unites the latter. In this article, through further discussion on the subjectivity of crowdsourcing translators, as well as the organizational pattern of several crowdsourcing translation cases, the following conclusion is achieved: crowdsourcing translation, aside from promoting the subjectivity of individual translators, reveals a collective subjectivity on a higher level. Such collective subjectivity, vigorous and powerful, would keep growing stronger rather than being suppressed by the established translation industry.

摘要:网络众包翻译是网络时代翻译的一个重要现象,和传统翻译有着明显区别,前者的译者主体性与后者相比发生了根本变化。众包翻译译者的主体性特点决定了译者及其翻译模式已经从传统的翻译权力体系中解放了出来,翻译不再垄断在社会“精英”手中,而成为社会共享资源。众包翻译是靠主体间的力量来揭示文本意义的最佳途径和方式,而意义本身也最适合在主体间传递,并在传递过程中把众主体连结起来,形成一个意义世界。本文认为众包翻译体现了集体主体性,具有强大的生命力,不但不会轻易地被传统翻译权力系统收编,而且会顽强地生存下去。

Published in:

East Journal of Translation, (2015), Vol.1, p.31-34, 91.
(东方翻译,(2015), 第一期, 31-34頁, 91)

External link:

http://www.ejtrans.com/mulu/33/3301.aspx?id=5

Internet Crowd Participation in Translation: History, Present, and Future

互联网大众翻译模式微探:历史、现时、未来

Abstract: The development of the internet and computer technology has brought about the democratization of information. More and more grass-root users start to initiate and participate in translation activities, and attracted public attention. The development of China’s internet translation communities reveals a pattern. That is, the translation communities started from non-profit grass-root initiation, gradually become organized, patent-protected, and even for-profit productions. “User-generated translation” and “crowdsourcing translation” are the theoretical frameworks that can explain these phenomena. This article traces the development of China’s internet translation communities in the past ten years, proposes the cultural and market value of such translation mode, and points out possible future challenges for the non-professional online translators. As for the non-professional translation communities, they have the following characters: firstly, they are carried out by grass-root internet users. Secondly, some of the communities survived through their conflict with the dominant discourse, while some failed to. Thirdly, crowdsourcing translation is more efficient when translating multimodal texts than the plain texts. All these characters provide shed a light on the new direction for the translation industry in China.

摘要:随着互联网的普及,计算机技术的发展,以及信息流动的民主化,由网民集体发起和参与的翻译现象变得越来越普遍,越来越受到广泛的关注。在中国的网络翻译社区的发展中,我们可以明显地观察到这样的发展脉络, 即, 从网民集体自发的翻译行为到有规律、有组织、有版权保护、甚至有盈利的翻译活动。本文将这种现象纳入“用户生成翻译”(user-generated translation)和“众包翻译”(crowdsourcing translation)的理论框架,通过追溯中国各类互联网翻译社区近十年来的发展,总结出中国众包翻译社区的文化创新意义及未来的市场价值,并指出其可能面对的挑战。本文认为,互联大众翻译社区存在着以下特点:1.翻译过程主要由网络草根用户集体进行;2.在与主流与权威翻译话语体系的对话中,一部分众包翻译社区遭到淘汰,另一部分得以存活并且壮大起来;3.该模式在翻译多模态文本时呈现出比翻译纯文字文本更大的优势,这为中国翻译行业的发展指出了新方向。

Published in:

Chinese Translators Journal, (2015), Vol. 5, p.78-82
(中国翻译,(2015), 第五期, 78-82頁)

External link:

http://www.tac-online.org.cn/ch/tran/2015-09/10/content_8225656.htm

Crowdsourcing Translation in Contemporary China: An Inspiring Perspective of Translation in the Web 2.0 Age

Abstract: With the popularization of internet, the development of computer science, and the democratization of information, internet crowdsourcing and crowdsourcing translation has become increasingly popular. Crowdsourcing translation is broadly applied in a variety of online communities, including: social networks, informative websites, movie and animation fansubbing groups, humanitarian organizations (language-learning websites, and so on. As crowdsourcing translation has accumulated attention from the professional circle, it also aroused controversial debates on copy-right issues, as well as quality problems and evaluation standards. To a large extent, crowdsourcing translation shares the main features with internet crowdsourcing: voluntary participation, crowdsourcers’ multiple identities, and in most cases, translation for free.

This research aims at investigating the history and analyzing the features of crowdsourcing translation in nowadays China, so as to expound how the internet has democratized the production and consumption of translation, how the subjects involved in crowdsourcing translation are dynamically interrelated to each other, and how the concepts and culture of translation have changed with the decentralization of the translation power structure.

To start with, China is a monolingual country without neither a mature economic system nor a strictly regulated translation industry. Thus although crowdsourcing translation has existed for about one decade, and has immensely developed in scale and popularity, the crowdsourcing spirit stayed the same: non-profit, personal goals prevail money. Also, in Chinese crowdsourcing communities, the aim for translation is majorly introducing in foreign cultural products. And in the most cases, the source language is English, the largest second language of Chinese people since the opening up policy, an indication of cultural hegemony. Further, in China, a country exerts a harsh media censorship, the utopian-like crowdsourcing communities at their early stage are in fact, protesting discourses against the official ideology.

Published in:

Meta: Journal des traducteurs/Meta: Translators’ Journal, (2015), Vol.60, No. 2, p.316, doi: 10.7202/1032867ar

External link:

http://www.erudit.org/revue/meta/2015/v60/n2/...

More than just language teaching: Ideologies in language textbooks

(with Wang, W. H.)

In this book review, we critiqued Xiao Lan Curdt-Christiansen and Csilla Weninger’s edited book “Language, ideology and education: The politics of textbooks in language education” published by Routledge in 2015. As iterated in this book, language education is laden with ideologies and politics and language textbooks currently available in the market are mostly biased. To prepare language learners for adequate cross-cultural communication as global citizens, careful consideration should be given to the ideologies encoded in language teaching materials.

Published in:

Linguistics and Education, (2018) (in print), available online 10 January 2018

External link:

https://doi.org/10.1016/j.linged.2018.01.002

An Exploration of Chinese News Audio-visual-oral Teaching Mode with an Open Vision

Abstract:In teaching Chinese as a second language, the news video is rarely applied, which may cause the situation that the students' listening comprehension ability can't match the requirement. Hereby, the present study applies the Chinese news audio-visual-oral teaching mode to solve this problem. This paper firstly introduces the foundation of this mode, including the theoretical bases of audio-visual-oral mode, and the current situation of the teaching materials of Chinese News Listening and Chinese Audio-visual-oral class. Then an example is provided to show how to design the class in this mode, including instructional objectives, the guides for choosing news topics, processing the news material, compiling the exercises and the teaching arrangement. Finally, a suggestion is put forward, namely, to build an open corpus, where every Chinese teacher can share his/her processed news materials with others.

中文摘要:在對外漢語教學中,新聞視頻素材很少被挖掘,這也導致學生的聽力水準與能力要求的水準之間有一定差距,所以我們開展了“漢語新聞視聽說”教學模式的探索。本文首先介紹了這種教學模式的基礎,包括“視”“聽”“說”一體化的理論基礎、新聞聽力方面和視聽說方面對外漢語教材的現狀;其次給出了該教學模式的教學目標和設計思路,包括選擇新聞話題的標準、新聞素材的加工要求、編寫練習的建議和教學課時安排;最後根據該教學模式的特點提出了教學資源分享的構想,即興建開放性語料庫,同行參與、素材共用的建設模式。

Published in:

TCSOL Studies, (2014), Vol 4, p.57-86
(華文教學與研究, (2014), 第四期, 57-86頁)

External link:

http://ronline.ro.cityu.edu.hk/scholar/download.do?fileUuid=...

Cue Competition between Animacy and Word Order: Acquisition of Chinese Notional Passives by L2 Learners

(with Xu, C)

ABSTRACT: Based on the Competition Model (Bates & MacWhinney, 1978; MacWhinney, 2005; MacWhinney, 2012), the present study investigates L2 cue strategies in the acquisition of Chinese notional passives by English-speaking and Japanese-speaking learners. Two experiments were conducted to examine both the comprehension and production of Chinese notional passives. The main findings included: 1) L2 learners’ acceptability of notional passive increased with improved Chinese proficiency but even advanced learners showed significant difference from Chinese native speakers; 2) L2 learners produced more notional passive sentences than bei-passive sentences and advanced learners showed no difference from Chinese native speakers; 3) Cross-linguistic influence seemed to affect L2 learners’ comprehension and production of Chinese notional passives. The results support the universality of animacy cue proposed by Gass (1987) but also suggest that word order and pragmatic factors may affect L2 learners’ cue strategies. The study also evidences the contribution of the input to the development of L2 cue strategies, which is in line with the predictions of the Competition Model.

Published in:

Open Journal of Modern Linguistics, (2015) Vol. 5 No. 2, p.213-224, doi: 10.4236/ojml.2015.52017

External link:

http://www.scirp.org/journal/PaperInformation.aspx?paperID=56060

Vernacularization in Medieval Chinese: A Quantitative Study on Classifiers, Demonstratives, and Copulae in the Chinese Buddhist Canon

(with LEE John)

Abstract: While studies on diachronic Chinese syntax have identified a number of linguistic changes in Medieval Chinese, they have mostly been underpinned by qualitative analyses. In the most large-scale quantitative analysis to-date, this article investigates changes in the use of classifiers, demonstratives, and copulae. Our analysis, based on the Chinese Buddhist Canon, examines over 40 million characters in texts spanning a millennium. Results suggest that from the late Eastern Han period (circa 150 CE) onwards, the vernacular style became increasingly widespread, at the expense of the literary style, as reflected by changes in the use of classifiers and demonstratives, and in the construction of nominal sentences. However, the vernacular style became less frequently used in the Northern Sung period (960–1127 CE). This reversal may shed light on the work of the Stylists, editors appointed by the Sung court to polish Buddhist texts with more literary elements.

Published in:

Digital Scholarship in the Humanities, (June 2018).

External link:

https://doi.org/10.1093/llc/fqy012

Register-sensitive Translation: A Case Study of Mandarin and Cantonese

(with LEE John)

Abstract: This paper describes an approach for translation between Chinese dialects that can produce target sentences at different registers. We focus on Mandarin as the source language, and Cantonese as the target. Mutually unintelligible, these two varieties of Chinese exhibit differences at both the lexical and syntactic levels, and the extent of the difference can vary considerably depending on the register of Cantonese. Since only a modest amount of parallel data is available, we adopt a knowledge-based approach and exploit lexical mappings and syntactic transformations from linguistics research. Our system parses a source sentence, uses register-annotated lexical mappings to translate words, and then performs word reordering through syntactic transformations. Evaluation shows that translation models that match the required register of the target sentences yield better translation quality.

Published in:

Proceedings of the 13th Conference of the Association for Machine Translation in the Americas, vol. 1: MT Research Track, ed. Colin Cherry & Graham Neubig, pp.89−96, Boston, United States of America, March 2018.

External link:

https://amtaweb.org/wp-content/uploads/2018/03/AMTA_2018...

Quantitative Comparative Syntax on the Cantonese-Mandarin Parallel Dependency Treebank

(with Kim GERDES, Herman LEUNG, and John LEE)

Abstract: This paper describes a new Cantonese-Mandarin parallel dependency treebank. We discuss the extent to which the treebank allows for comparative measures with the goal of quantifying structural differences between the two languages. After presenting syntactic differences between the two languages, we computed various frequency measures on the treebank. We present the results and discuss whether they reflect differences in text genre, differences in annotation scheme design, or actual structural differences. Finally, we compare the structural differences to previous accounts of the observed construction.

Published in:

Proceedings of the Fourth International Conference on Dependency Linguistics, pp. 266−275, Pisa, Italy, September 2017.

External link:

http://www.ep.liu.se/ecp/article.asp?issue=139&article=030

Chinese Interrogative Particles as Talk Coordinators at the Right Periphery: A Discourse-Pragmatic Perspective

(with Winnie Oi-Wan CHOR and Foong Ha YAP)

Abstract: This paper examines how utterance-final interrogative particles in Chinese contribute to the management of local and global coherence in conversational discourse. Using Schiífrin's (1987) model of discourse coherence, and focusing in particular on the Cantonese particle ho2 we show how an interrogative particle is often also used as an interactional particle. In the case of ho2, we show how this information-seeking particle is frequently recruited as an affirmation seeking and solidarity-enhancing device. Special attention is given to the extended uses of ho2 in terms of Schiffrin's exchange and action structures, as well as participation frameworks and information states. Our analysis highlights how speakers effectively use utterance particles as exemplified by ho2 to convey their (inter)subjective footing and in the process negotiate meaningful affiliative/disaffiliative interaction among interlocutors, and thereby achieve discourse coherence for effective communication.

Published in:

Journal of Historical Pragmatics, Vol. 17(2), pp. 178−207, December 2016.

External link:

https://benjamins.com/content/home#catalog/journals/jhp.17.2.02cho/details

Developing Universal Dependencies for Mandarin Chinese

(with Herman LEUNG, Rafaël POIRET, Xinying CHEN, Kim GERDES and John LEE)

Abstract: This article proposes a Universal Dependency Annotation Scheme for Mandarin Chinese, including POS tags and dependency analysis. We identify cases of idiosyncrasy of Mandarin Chinese that are difficult to fit into the current schema which has mainly been based on the descriptions of various Indo-European languages. We discuss differences between our scheme and those of the Stanford Chinese Dependencies and the Chinese Dependency Treebank.

Published in:

Proceedings of the 12th Workshop on Asian Language Resources, pp. 20−29, Osaka, Japan, December 2016.

External link:

http://www.aclweb.org/anthology/W/W16/W16-54.pdf#page=32

Conversational Network in the Chinese Buddhist Canon

(with John LEE)

Abstract: This article describes a method to analyze characters in a literary text by considering their verbal interactions. This method exploits techniques from computational linguistics to extract all direct speech from a treebank, and to build a conversational network that visualizes the speakers, the listeners and their degree of interaction. We apply this method to create and visualize a conversational network for the Chinese Buddhist Canon. We analyze the protagonists and their interlocutors, and report statistics on their number of utterances and types of listeners, how their speech was reported, and subcommunities in the network.

Published in:

Open Linguistics, Vol. 2(1), pp. 427−436, October 2016.

External link:

https://doi.org/10.1515/opli-2016-0022

Corpus-Based Learning of Cantonese for Mandarin Speakers

(with LEE John)

Abstract. This paper reports our experience in using a parallel corpus to teach Cantonese, a variety of Chinese spoken in Hong Kong, as a second language. The parallel corpus consists of pairs of word-aligned sentences in Cantonese and Mandarin Chinese, drawn from television programs in Hong Kong (Lee, 2011). We evaluated our pedagogical approach with Mandarin-speaking students at a university course. For each student, we first diagnosed the set of Cantonese words with which s/he experienced difficulties. Then, on a web-based interface, the student independently searched in the parallel corpus for sentence pairs involving this set of Cantonese words, and analysed the translations and usage examples. Our experiments showed that, in both the short- and long-term, the corpus-based pedagogical method helped students better retain their knowledge of difficult Cantonese words.

Published in:

Proceedings of the 2014 EUROCALL Conference - CALL Design: Principles and Practice, (2014), p.196-201, doi: 10.14705/rpnet.2014.000217

External link:

http://reference.research-publishing.net/display_article.php?...

On the Development of Sentence Final Particles (and Utterance tags) in Chinese

(with Foong Ha Yap and Ying Yang)

Abstract: This paper holds the opinion that although lacking unique markers for counterfactuals. Chinese indeed has counterfactual thoughts as well as expressions. Based on the definition of counterfactual conditionals, this paper lists three different types of counterfactual conditionals in Chinese for analysis of the relationship between the past tense and counterfactuality. Besides, samples from other languages are also collected to discuss the relationship between the temporal markers and counterfactuality. Based on the above analysis, we conclude that Chinese belongs to the past-tense- counterfactuality language although the tense is not fake. Past tense cannot be grammaticalized into counterfactual markers because of the isolating characteristics of Chinese.

Published by:

Discourse Functions at the Left and Right Periphery: Crosslinguistic Investigations of Language Use and Language Change, (2014), p.179−220

External link:

http://www.brill.com/cn/products/book/discourse-functions-left-...

Automatic Detection of Comma Splices

In English text, independent clauses should be demarcated with full-stops (periods), or linked together with conjunctions. Non-native speakers are often prone to linking them improperly with commas instead of conjunctions, producing comma splices. This paper describes a method to detect comma splices using Conditional Random Fields (CRF), with features derived from parse tree patterns. In experiments, our model achieved an average of 0.91 precision and 0.28 recall in detecting comma splices, significantly outperforming both a baseline model using only local features and a widely used commercial grammar checker.

Published in:

Proceedings of the 28th Pacific Asia Conference on Language, Information and Computation, (2014), p.551-560

External link:

http://www.aclweb.org/anthology/Y14-1063

CityU Corpus of Essay Drafts of English Language Learners: a Corpus of Textual Revision in Second Language Writing

Learner corpora consist of texts produced by non-native speakers. In addition to these texts, some learner corpora also contain error annotations, which can reveal common errors made by language learners, and provide training material for automatic error correction. We present a novel type of error-annotated learner corpus containing sequences of revised essay drafts written by non-native speakers of English. Sentences in these drafts are annotated with comments by language tutors, and are aligned to sentences in subsequent drafts. We describe the compilation process of our corpus, present its encoding in TEI XML, and report agreement levels on the error annotations. Further, we demonstrate the potential of the corpus to facilitate research on textual revision in L2 writing, by conducting a case study on verb tenses using ANNIS, a corpus search and visualization platform.

Published by:

Language Resources and Evaluation, (2015), Vol. 49, No. 3, p.659-683, doi: 10.1007/s10579-015-9301-z

External link:

http://link.springer.com/article/10.1007%2Fs10579-015-9301-z

The Typological Stage of Morphology in Chinese Counterfactuals

Abstract: Counterfactuality is used to express a counter-to-factual opinion, here the factual which is the truth held by the speaker does not always coincide with the objective world. Based on cross-linguistic research, we adduce many examples in favor of the opinion that Chinese is not a language based on CF ( Counterfactuality) markers but HE ( Hypotheticality Enhancing) markers. HE markers differ from CF markers in that they could not mark CF exclusively, independently and obligatorily.

Published in:

外國語(上海外國語大學學報), (2015), 第三十八卷一期, 三十至四十一頁

External link:

http://big5.oversea.cnki.net/kcms...

A Typological Research Towards Counterfactual Conditionals

Abstract: This paper holds the opinion that although lacking unique markers for counterfactuals. Chinese indeed has counterfactual thoughts as well as expressions. Based on the definition of counterfactual conditionals, this paper lists three different types of counterfactual conditionals in Chinese for analysis of the relationship between the past tense and counterfactuality. Besides, samples from other languages are also collected to discuss the relationship between the temporal markers and counterfactuality. Based on the above analysis, we conclude that Chinese belongs to the past-tense- counterfactuality language although the tense is not fake. Past tense cannot be grammaticalized into counterfactual markers because of the isolating characteristics of Chinese.

Published in:

外國語(上海外國語大學學報), (2014), 第三十七卷三期, 五十九至七十頁

External link:

http://big5.oversea.cnki.net/kcms...

 

Disclaimer

To make use of any material hosted at this page, you must give appropriate credit, provide a link to the source of origin, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the copyright owners endorses you or your use.

You may not use the material for commercial purposes.

 

Last updated: 7 August 2018