Skip to the content.

--2020--  --2022--  --2024--


Workshop on Language Technologies for Historical and Ancient Languages

The ever increasing number of writings in historical and ancient languages available in digital form is leading to a growth of interest in the creation of tools for their automatic linguistic processing. However, these languages present a number of characteristics, which set them apart from modern languages, with a significant impact on language technologies.

Typically, historical and ancient languages lack large linguistic resources, such as annotated corpora, and data can be sparse and very inconsistent; texts present considerable orthographic variation, they can be transmitted by different witnesses and in different critical editions, they can be incomplete and scattered across a wide temporal and geographical span. This makes the selection of representative texts, and thus the development of benchmarks, very hard. Moreover, texts in machine-readable format are often the result of manuscript digitization processes during which OCR systems can cause errors degrading the quality of the documents. Another peculiarity is that most of the texts written in historical and ancient languages are literary, philosophical or documentary, therefore of a very different genre from that on which LTs are usually trained, i.e. news. This is strictly connected to the fact that the final users of LTs for historical and ancient languages are mostly humanists, such as philologists, who expect a high accuracy of results that allows a precise analysis of linguistic data.

The workshop is also be the venue of EvaLatin, the first evaluation campaign totally devoted to the evaluation of NLP tools for Latin, and EvaHan, the first evaluation campaign dedicated to NLP tools for Ancient Chinese.

The first two editions of the workshop were organized in the context of the LiLa: Linking Latin ERC project, Grant Agreement No. 769994.

References