Authors:
(1) Martyna Wiącek, Institute of Computer Science, Polish Academy of Sciences;
(2) Piotr Rybak, Institute of Computer Science, Polish Academy of Sciences;
(3) Łukasz Pszenny, Institute of Computer Science, Polish Academy of Sciences;
(4) Alina Wróblewska, Institute of Computer Science, Polish Academy of Sciences. Editor's note: This is Part 4 of 10 of a study on improving the evaluation and comparison of tools used in natural language preprocessing. Read the rest below. Table of Links Abstract and 1. Introduction and related works NLPre benchmarking 2.1. Research concept 2.2. Online benchmarking system 2.3. Configuration NLPre-PL benchmark 3.1. Datasets 3.2. Tasks Evaluation 4.1. Evaluation methodology 4.2. Evaluated systems 4.3. Results Conclusions

Appendices
Acknowledgements
Bibliographical References
Language Resource References 2.3. Configuration We acknowledge the need to configure similar evaluation environments for other languages to promote linguistic diversity within the worldwide NLP community and to support local NLP communities working on a particular language. To ensure that, we publish a .yaml file that enables easy management of datasets, tagset, and metrics included in the benchmark. The content of all subpages can be modified using a WYSIWYG editor within the application. This setting ensures quite a low entry level for setting up the platform, with minimal changes required. As a standard feature, we include pre-defined descriptions for the prevalent NLPre tasks. Those can be modified via either configuration files or the administrator panel. Additionally, we supply a default evaluation script, but users are free to provide their own customised code. To show the capabilities of the benchmarking system, we set up a prototype for Polish (Figure 1). NLPre-PL is described in detail in Section 3. To support our claim that the system is language agnostic, we set up NLPre-GA for Irish and NLPreZH for Chinese. The choice of those languages is not arbitrary; our objective is to demonstrate the capability of the platform in evaluating diverse languages, including those based on non-Latin scripts. In setting up said benchmarking systems we use existing UDv2.9 treebanks: UD_Chinese-GSD (Shen et al., 2019) and UD_Irish-IDT (Lynn et al., 2015) and available up-to-date models, trained on these treebanks. The selection of models mirrors the criteria applied in this work regarding the evaluation of Polish, that is: COMBO, Stanza, SpaCy, UDPipe, and Trankit. If the specific model is not available for UDv2.9, we train it from scratch on the datasets linked above. This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license. Authors: (1) Martyna Wiącek, Institute of Computer Science, Polish Academy of Sciences; (2) Piotr Rybak, Institute of Computer Science, Polish Academy of Sciences; (3) Łukasz Pszenny, Institute of Computer Science, Polish Academy of Sciences; (4) Alina Wróblewska, Institute of Computer Science, Polish Academy of Sciences. Authors: Authors: (1) Martyna Wiącek, Institute of Computer Science, Polish Academy of Sciences; (2) Piotr Rybak, Institute of Computer Science, Polish Academy of Sciences; (3) Łukasz Pszenny, Institute of Computer Science, Polish Academy of Sciences; (4) Alina Wróblewska, Institute of Computer Science, Polish Academy of Sciences. Editor's note: This is Part 4 of 10 of a study on improving the evaluation and comparison of tools used in natural language preprocessing. Read the rest below. Editor's note: This is Part 4 of 10 of a study on improving the evaluation and comparison of tools used in natural language preprocessing. Read the rest below. Editor's note: This is Part 4 of 10 of a study on improving the evaluation and comparison of tools used in natural language preprocessing. Read the rest below. Editor's note: This is Part 4 of 10 of a study on improving the evaluation and comparison of tools used in natural language preprocessing. Read the rest below. Table of Links Abstract and 1. Introduction and related works Abstract and 1. Introduction and related works NLPre benchmarking NLPre benchmarking 2.1. Research concept 2.1. Research concept 2.2. Online benchmarking system 2.2. Online benchmarking system 2.3. Configuration 2.3. Configuration NLPre-PL benchmark NLPre-PL benchmark 3.1. Datasets 3.1. Datasets 3.2. Tasks 3.2. Tasks Evaluation Evaluation 4.1. Evaluation methodology 4.1. Evaluation methodology 4.2. Evaluated systems 4.2. Evaluated systems 4.3. Results 4.3. Results Conclusions

Appendices
Acknowledgements
Bibliographical References
Language Resource References Conclusions Appendices
Acknowledgements
Bibliographical References
Language Resource References Conclusions Appendices Acknowledgements Bibliographical References Language Resource References Appendices Acknowledgements Bibliographical References Language Resource References 2.3. Configuration We acknowledge the need to configure similar evaluation environments for other languages to promote linguistic diversity within the worldwide NLP community and to support local NLP communities working on a particular language. To ensure that, we publish a .yaml file that enables easy management of datasets, tagset, and metrics included in the benchmark. The content of all subpages can be modified using a WYSIWYG editor within the application. This setting ensures quite a low entry level for setting up the platform, with minimal changes required. As a standard feature, we include pre-defined descriptions for the prevalent NLPre tasks. Those can be modified via either configuration files or the administrator panel. Additionally, we supply a default evaluation script, but users are free to provide their own customised code. To show the capabilities of the benchmarking system, we set up a prototype for Polish (Figure 1). NLPre-PL is described in detail in Section 3. To support our claim that the system is language agnostic, we set up NLPre-GA for Irish and NLPreZH for Chinese. The choice of those languages is not arbitrary; our objective is to demonstrate the capability of the platform in evaluating diverse languages, including those based on non-Latin scripts. In setting up said benchmarking systems we use existing UDv2.9 treebanks: UD_Chinese-GSD (Shen et al., 2019) and UD_Irish-IDT (Lynn et al., 2015) and available up-to-date models, trained on these treebanks. The selection of models mirrors the criteria applied in this work regarding the evaluation of Polish, that is: COMBO, Stanza, SpaCy, UDPipe, and Trankit. If the specific model is not available for UDv2.9, we train it from scratch on the datasets linked above. This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license. This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license. available on arxiv

Part of HackerNoon's growing list of open-source research papers, promoting free access to academic material.

Researchers Create Plug-and-Play System to Test Language AI Across the Globe

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

AI Breakthrough Sharpens Telescope Images-Astronomy’s Next Big Leap

New Framework Simplifies Comparison of Language Processing Tools Across Multiple Languages

Researchers Build Public Leaderboard for Language Processing Tools

New Web App Lets Researchers Test and Rank Language AI Tools in Real Time

New Framework Promises to Train AI to Better Understand Hard-to-Grasp Languages Like Polish

Researchers Challenge AI to Tackle the Toughest Parts of Language Processing

AI Breakthrough Sharpens Telescope Images-Astronomy’s Next Big Leap

New Framework Simplifies Comparison of Language Processing Tools Across Multiple Languages

Researchers Build Public Leaderboard for Language Processing Tools

New Web App Lets Researchers Test and Rank Language AI Tools in Real Time

New Framework Promises to Train AI to Better Understand Hard-to-Grasp Languages Like Polish

Researchers Challenge AI to Tackle the Toughest Parts of Language Processing

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps