LT4HALA

LT4HALA 2020

--Home-- --CFP-- --EvaLatin-- --Keynote Speaker-- --Program-- --Organization--

CALL FOR PAPERS

Website: https://circse.github.io/LT4HALA/
Submission page: https://www.softconf.com/lrec2020/LT4HALA
Date: May 12, 2020
Place: co-located with LREC 2020, May 11-16, Marseille, France
NEW SUBMISSION DEADLINE: February 21, 2020

DESCRIPTION

LT4HALA is a one-day workshop that seeks to bring together scholars who are developing and/or are using Language Technologies (LTs) for historically attested languages, so to foster cross-fertilization between the Computational Linguistics community and the areas in the Humanities dealing with historical linguistic data, e.g. historians, philologists, linguists, archaeologists and literary scholars. Despite the current availability of large collections of digitized texts written in historical languages, such interdisciplinary collaboration is still hampered by the limited availability of annotated linguistic resources for most of the historical languages. Creating such resources is a challenge and an obligation for LTs, both to support historical linguistic research with the most updated technologies and to preserve those precious linguistic data that survived from past times.

Relevant topics for the workshop include, but are not limited to:

handling spelling variation;
detection and correction of OCR errors;
creation and annotation of digital resources;
deciphering;
morphological/syntactic/semantic analysis of textual data;
adaptation of tools to address diachronic/diatopic/diastratic variation in texts;
teaching ancient languages with NLP tools;
NLP-driven theoretical studies in historical linguistics;
evaluation of NLP tools.

SHARED TASKS

Just because of the limited amount of data preserved for historical and ancient languages, an important role is played by evaluation practices, to understand the level of accuracy of the NLP tools used to build and analyze resources. Given the prominence of Latin, by virtue of its wide diachronic and diatopic span covering two millennia all over Europe, the workshop will host the first edition of EvaLatin, an evaluation campaign entirely devoted to the evaluation of NLP tools for Latin. The first edition of EvaLatin will focus on two tasks (i.e. Lemmatization and PoS tagging), each featuring three sub-tasks (i.e. Classical, Cross-Genre, Cross-Time). These sub-tasks are designed to measure the impact of genre and diachrony on NLP tools performances, a relevant aspect to keep in mind when dealing with the diachronic and diatopic diversity of Latin. Participants will be provided with shared data in the CoNLL-U format and all the necessary evaluation scripts.

SUBMISSIONS

For the workshop, we invite papers of different types such as experimental papers, reproduction papers, resource papers, position papers, survey papers. Both long and short papers describing original and unpublished work are welcome.

Long papers should deal with substantial completed research and/or report on the development of new methodologies. They may consist of up to 8 pages of content plus 2 pages of references.

Short papers are instead appropriate for reporting on works in progress or for describing a singular tool or project. They may consist of up to 4 pages of content plus 2 pages of references. We encourage the authors of papers reporting experimental results to make their results reproducible and the entire process of analysis replicable, by making the data and the tools they used available. The form of the presentation may be oral or poster, whereas in the proceedings there is no difference between the accepted papers. The submission is NOT anonymous. The LREC official format is requested. Each paper will be reviewed but three independent reviewers.

As for EvaLatin, participants will be required to submit a technical report for each task (with all the related sub-tasks) they took part in. Technical reports will be included in the proceedings as short papers: the maximum length is 4 pages (excluding references) and they should follow the LREC official format. Reports will receive a light review (we will check for the correctness of the format, the exactness of results and ranking, and overall exposition). All participants will have the possibility to present their results at the workshop: we will allocate an oral session and a poster session fully devoted to the shared tasks in the afternoon.

IMPORTANT DATES

Workshop

~~17 February 2020: submissions due~~ NEW DEADLINE: 21 February 2020
10 March 2020: notifications to authors
27 March 2020: camera-ready (PDF) due
12 May 2020: workshop

EvaLatin PLEASE NOTE THAT NO EXTENSION IS PLANNED FOR THE SHARED TASKS

10 December 2019: training data available
Evaluation Window I - Task: Lemmatization
- 17 February 2020: test data available
- 21 February 2020 system results due to organizers
Evaluation Window II - Task: PoS tagging
- 24 February 2020: test data available
- 28 February 2020: system results due to organizers
6 March 2020: assessment returned to participants
27 March 2020: reports due to organizers
10 April 2020: camera ready version of reports due to organizers
12 May 2020: workshop

Describing your LRs in the LRE Map is now a normal practice in the submission procedure of LREC (introduced in 2010 and adopted by other conferences). To continue the efforts initiated at LREC 2014 about “Sharing LRs” (data, tools, web-services, etc.), authors will have the possibility, when submitting a paper, to upload LRs in a special LREC repository. This effort of sharing LRs, linked to the LRE Map for their description, may become a new “regular” feature for conferences in our field, thus contributing to creating a common repository where everyone can deposit and share data.

ISLRN number

As scientific work requires accurate citations of referenced work so as to allow the community to understand the whole context and also replicate the experiments conducted by other researchers, LREC 2020 endorses the need to uniquely Identify LRs through the use of the International Standard Language Resource Number (ISLRN), a Persistent Unique Identifier to be assigned to each Language Resource. The assignment of ISLRNs to LRs cited in LREC papers will be offered at submission time.

Back to the Main Page