--Home-- --CFP-- --EvaLatin-- --Keynote Speaker-- --Program-- --Organization--
Workshop on Language Technologies for Historical and Ancient Languages (#LT4HALA)
Co-located with LREC 2020
Tuesday, May 12 2020 - Marseille, France
New submission deadline: February 21, 2020
The ever increasing number of writings in historical and ancient languages available in digital form is leading to a growth of interest in the creation of tools for their automatic linguistic processing. However, these languages present a number of characteristics, which set them apart from modern languages, with a significant impact on language technologies.
Typically, historical and ancient languages lack large linguistic resources, such as annotated corpora, and data can be sparse and very inconsistent; texts present considerable orthographic variation, they can be transmitted by different witnesses and in different critical editions, they can be incomplete and scattered across a wide temporal and geographical span. This makes the selection of representative texts, and thus the development of benchmarks, very hard. Moreover, texts in machine-readable format are often the result of manuscript digitization processes during which OCR systems can cause errors degrading the quality of the documents. Another peculiarity is that most of the texts written in historical and ancient languages are literary, philosophical or documentary, therefore of a very different genre from that on which LTs are usually trained, i.e. news. This is strictly connected to the fact that the final users of LTs for historical and ancient languages are mostly humanists, such as philologists, who expect a high accuracy of results that allows a precise analysis of linguistic data.
This one-day workshop seeks to bring together scholars, who are developing and/or are using Language Technologies (LTs) for historically attested languages, so to foster cross-fertilization between the Computational Linguistics community and the areas in the Humanities dealing with historical linguistic data, e.g. historians, philologists, linguists, archaeologists and literary scholars. Despite the current availability of large collections of digitized texts written in historical languages, such interdisciplinary collaboration is still hampered by the limited availability of annotated linguistic resources for most of the historical languages. Creating such resources is a challenge and an obligation for LTs, both to support historical linguistic research with the most updated technologies and to preserve those precious linguistic data that survived from past times.
The workshop will also be the venue of EvaLatin, the first evaluation campaign totally devoted to the evaluation of NLP tools for Latin.
This workshop is organized in the context of the LiLa: Linking Latin ERC project, Grant Agreement No. 769994.
- Bollmann, M., Dipper, S., Krasselt, J., & Petran, F. Manual and semi-automatic normalization of historical spelling-case studies from Early New High German. In KONVENS (pp. 342-350). 2012.
- Burns, Patrick J. “Building a Text Analysis Pipeline for Classical Languages.” In Berti, M. (Ed.) Digital Classical Philology. Ancient Greek and Latin in the Digital Revolution. Berlin, Boston: De Gruyter Saur. 2019.
- Dipper, Stefanie. “POS-Tagging of Historical Language Data: First Experiments.” KONVENS. 2010.
- Eckhoff, H., Bech, K., Bouma, G., Eide, K., Haug, D., Haugen, O. E., & Jøhndal, M. The PROIEL treebank family: a standard for early attestations of Indo-European languages. Language Resources and Evaluation, 52.1 (2018), 29-65.
- Lee, John, and Mengqi Luo. “Inducing Word Clusters from Classical Chinese Poems.” International Journal of Asian Language Processing 28.1 (2018): 13-30.
- McGillivray, Barbara, Marco Passarotti, and Paolo Ruffolo. “The Index Thomisticus Treebank Project: Annotation, Parsing and Valency Lexicon.” TAL 50.2 (2009): 103-127.
- Pettersson, Eva, and Joakim Nivre. “Improving verb phrase extraction from historical text by use of verb valency frames.” Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015). 2015.
- Piotrowski, Michael. “Natural language processing for historical texts.” Synthesis lectures on human language technologies 5.2 (2012): 1-157.
- Sayoud, Halim, and Siham Ouamour. “Score Fusion Based Authorship Attribution of Ancient Arabic Texts.” In The Thirtieth International Flairs Conference. 2017.