Skip to the content.

LT4HALA 2026

--Home--  --CFP--  --EvaLatin--  --EvaHan--  --EvaCun--  --Program--  --Organization--

EvaHan 2026

Previous Tasks

The First Bake-off of ancient Chinese automatic processing was successfully held in Marseille, France, in 2022, with a focus on automatic word segmentation and part-of-speech tagging of ancient Chinese.

The Second Bake-off of ancient Chinese automatic processing was successfully held in Macau, China, in 2023, with a focus on machine translation of ancient Chinese.

The Third Bake-off of ancient Chinese automatic processing was held in Turin, Italy, in 2024, with a focus on automatic sentence segmentation and punctuation of ancient Chinese.

The Fourth Bake-off of ancient Chinese automatic processing was held in New Mexico, USA, in 2025, with a focus on named entity recognition in ancient Chinese.

Important Dates for EvaHan 2026

Participation

To participate in EvaHan 2026, you must complete the following steps:

  1. Registration:
    Fill in the registration form to officially register your team for the task. Registration is open from December 1, 2025, to January 30, 2026. Only registered participants will gain access to the training dataset.

  2. Accessing the Training Data:
    After completing the registration process, participants will receive instructions for downloading the training dataset, which includes image–text pairs from ancient Chinese texts for OCR.

  3. Submitting Results and Reports:
    Participants must use the provided test data to generate results and submit their system outputs and a technical report as per the shared task schedule.

For inquiries or to request the registration form, please contact us at evahan2026@gmail.com.

Data

The Evahan 2026 dataset comprises three datasets, covering image-text pairs: plain text images, mixed image-text images, and handwritten images-text. The data underwent initial automatic annotation, followed by meticulous correction and refinement by experts in classical Chinese language and history to ensure the highest quality of the training materials and gold-standard texts.

● Dataset A (Printed Texts) consists of data selected from the Siku Quanshu (Complete Library of the Four Treasuries), including classics, history, philosophy, and literature, as well as various other ancient books.

● Dataset B (Mixed Layouts) contains mixed image-text data selected from the Siku Quanshu and other ancient books.

● Dataset C (Handwritten Texts) includes handwritten ancient books, primarily the Chinese Buddhist canon, including the Chinese Buddhist canon (TKH) dataset, and the Chinese Buddhist canon (MTH) dataset.

Organizers

Student Members

For more information, please refer to our home page: EvaHan 2026

Back to the Main Page