paint-brush
Researchers Create Plug-and-Play System to Test Language AI Across the Globeby@morphology

Researchers Create Plug-and-Play System to Test Language AI Across the Globe

by MorphologyDecember 30th, 2024
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Researchers in Poland have developed an open-source tool that improves the evaluation and comparison of AI used in natural language preprocessing.
featured image - Researchers Create Plug-and-Play System to Test Language AI Across the Globe
Morphology HackerNoon profile picture

Authors:

(1) Martyna Wiącek, Institute of Computer Science, Polish Academy of Sciences;

(2) Piotr Rybak, Institute of Computer Science, Polish Academy of Sciences;

(3) Łukasz Pszenny, Institute of Computer Science, Polish Academy of Sciences;

(4) Alina Wróblewska, Institute of Computer Science, Polish Academy of Sciences.

Editor's note: This is Part 4 of 10 of a study on improving the evaluation and comparison of tools used in natural language preprocessing. Read the rest below.

Abstract and 1. Introduction and related works

  1. NLPre benchmarking

2.1. Research concept

2.2. Online benchmarking system

2.3. Configuration

  1. NLPre-PL benchmark

3.1. Datasets

3.2. Tasks

  1. Evaluation

4.1. Evaluation methodology

4.2. Evaluated systems

4.3. Results

  1. Conclusions
    • Appendices
    • Acknowledgements
    • Bibliographical References
    • Language Resource References

2.3. Configuration

We acknowledge the need to configure similar evaluation environments for other languages to promote linguistic diversity within the worldwide NLP community and to support local NLP communities working on a particular language. To ensure that, we publish a .yaml file that enables easy management of datasets, tagset, and metrics included in the benchmark. The content of all subpages can be modified using a WYSIWYG editor within the application. This setting ensures quite a low entry level for setting up the platform, with minimal changes required.


As a standard feature, we include pre-defined descriptions for the prevalent NLPre tasks. Those can be modified via either configuration files or the administrator panel. Additionally, we supply a default evaluation script, but users are free to provide their own customised code.


To show the capabilities of the benchmarking system, we set up a prototype for Polish (Figure 1). NLPre-PL is described in detail in Section 3. To support our claim that the system is language agnostic, we set up NLPre-GA for Irish and NLPreZH for Chinese. The choice of those languages is not arbitrary; our objective is to demonstrate the capability of the platform in evaluating diverse languages, including those based on non-Latin scripts. In setting up said benchmarking systems we use existing UDv2.9 treebanks: UD_Chinese-GSD (Shen et al., 2019) and UD_Irish-IDT (Lynn et al., 2015) and available up-to-date models, trained on these treebanks. The selection of models mirrors the criteria applied in this work regarding the evaluation of Polish, that is: COMBO, Stanza, SpaCy, UDPipe, and Trankit. If the specific model is not available for UDv2.9, we train it from scratch on the datasets linked above.


This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license.