Appraise Evaluation Framework

Appraise is an open-source framework for crowd-based annotation tasks, notably for evaluation of machine translation (MT) outputs. The software is used to run the yearly human evaluation campaigns for shared tasks at the WMT Conference on Machine Translation and other events.

Annotation tasks currently supported in Appraise:

Segment-level direct assessment (DA)
Document-level direct assessment
Pairwise direct assessment (similar to EASL and RankME)
Multimodal MT assessment

Getting Started

See INSTALL.md for a step-by-step instructions on how to install prerequisites and setup Appraise.

Usage

See Examples/ for simple end-to-end examples for setting up currently supported annotation tasks and read how to create your own campaign in Appraise.

License

This project is licensed under the BSD-3-Clause License.

Citation

If you use Appraise in your research, please cite the following paper:

@inproceedings{federmann-2018-appraise,
    title = "Appraise Evaluation Framework for Machine Translation",
    author = "Federmann, Christian",
    booktitle = "Proceedings of the 27th International Conference on
        Computational Linguistics: System Demonstrations",
    month = aug,
    year = "2018",
    address = "Santa Fe, New Mexico",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/C18-2019",
    pages = "86--88"
}

WMT21 Collaboration with Toloka

For WMT21 we have partnered with Toloka to collect more annotations for the human evaluation of the news translation shared task. We are grateful for their support and look forward to our continued collaboration in the future!

In Toloka's own words:

The international data labeling platform Toloka collaborated with the WMT team to improve existing machine translation methods. Toloka's crowdsourcing service was integrated with Appraise, an open-source framework for human-based annotation tasks.

To increase the accuracy of machine translation, we need to systematically compare different MT methods to reference data. However, obtaining sufficient reference data can pose a challenge, especially for rare languages. Toloka solved this problem by providing a global crowdsourcing platform with enough annotators to cover all relevant language pairs. At the same time, the integration preserved the labeling processes that were already set up in Appraise without breaking any tasks.

Collaboration between Toloka and Appraise made it possible to get a relevant pool of annotators, provide them with an interface for labeling and getting rewards, and then combine quality control rules from both systems into a mutually reinforcing set for reliable results.

You can learn more about Toloka on their website: https://toloka.ai/

Name		Name	Last commit message	Last commit date
Latest commit History 1,799 Commits
.github/workflows		.github/workflows
Appraise		Appraise
Campaign		Campaign
Dashboard		Dashboard
EvalData		EvalData
EvalView		EvalView
Examples		Examples
RegressionTests		RegressionTests
.gitignore		.gitignore
INSTALL.md		INSTALL.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
TODOS		TODOS
create_iwslt22_tasks.py		create_iwslt22_tasks.py
create_wmt19_tasks.py		create_wmt19_tasks.py
create_wmt21_tasks.py		create_wmt21_tasks.py
create_wmt22_tasks.py		create_wmt22_tasks.py
deprecated.py		deprecated.py
install-psql.sh		install-psql.sh
manage.py		manage.py
mypy.ini		mypy.ini
pylint.rc		pylint.rc
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
setup.cfg		setup.cfg
validate.sh		validate.sh

License

AppraiseDev/Appraise

Folders and files

Latest commit

History

Repository files navigation

Appraise Evaluation Framework

Getting Started

Usage

License

Citation

WMT21 Collaboration with Toloka

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages