Researchers Build Public Leaderboard for Language Processing Tools

Authors: (1) Martyna Wiącek, Institute of Computer Science, Polish Academy of Sciences; (2) Piotr Rybak, Institute of Computer Science, Polish Academy of Sciences; (3) Łukasz Pszenny, Institute of Computer Science, Polish Academy of Sciences; (4) Alina Wróblewska, Institute of Computer Science, Polish Academy of Sciences. Editor's note: This is Part 2 of 10 of a study on improving the evaluation and comparison of tools used in natural language preprocessing. Read the rest below. Table of Links Abstract and 1. Introduction and related works NLPre benchmarking 2.1. Research concept 2.2. Online benchmarking system 2.3. Configuration NLPre-PL benchmark 3.1. Datasets 3.2. Tasks Evaluation 4.1. Evaluation methodology 4.2. Evaluated systems 4.3. Results Conclusions Appendices Acknowledgements Bibliographical References Language Resource References 2. NLPre benchmarking 2.1. Research concept In this study, we introduce a novel adaptation of the benchmarking approach to NLPre. The primary objective is to establish an automated and credible method for evaluating NLPre systems against a provided benchmark and continuously updating their performance ranking on a publicly accessible scoreboard. More specifically, predictions for the benchmark test sets output by NLPre systems and 5 https://nlpre-pl.clarin-pl.eu 6 https://nlpre-zh.clarin-pl.eu 7 https://nlpre-ga.clarin-pl.eu submitted to the benchmarking system are automatically compared against the publicly undisclosed reference dataset. This method effectively prevents result manipulation and ensures fairness of the final assessment. The second important methodological assumption is to enable the ongoing evaluation of new or upgraded NLPre systems to guarantee up-to-date and complete ranking. Consequently, the leaderboard can serve as a reliable point of reference for NLPre system developers. Based on these assumptions, we design and implement the language-centric and tagset-agnostic benchmarking system that enables comprehensive and credible evaluation, constitutes an up-to-date source of information on NLPre progress, and is fully configurable to facilitate building benchmarking systems for multiple languages. This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license. Authors: (1) Martyna Wiącek, Institute of Computer Science, Polish Academy of Sciences; (2) Piotr Rybak, Institute of Computer Science, Polish Academy of Sciences; (3) Łukasz Pszenny, Institute of Computer Science, Polish Academy of Sciences; (4) Alina Wróblewska, Institute of Computer Science, Polish Academy of Sciences. Authors: Authors: (1) Martyna Wiącek, Institute of Computer Science, Polish Academy of Sciences; (2) Piotr Rybak, Institute of Computer Science, Polish Academy of Sciences; (3) Łukasz Pszenny, Institute of Computer Science, Polish Academy of Sciences; (4) Alina Wróblewska, Institute of Computer Science, Polish Academy of Sciences. Editor's note: This is Part 2 of 10 of a study on improving the evaluation and comparison of tools used in natural language preprocessing. Read the rest below. Editor's note: This is Part 2 of 10 of a study on improving the evaluation and comparison of tools used in natural language preprocessing. Read the rest below. Editor's note: This is Part 2 of 10 of a study on improving the evaluation and comparison of tools used in natural language preprocessing. Read the rest below. Editor's note: This is Part 2 of 10 of a study on improving the evaluation and comparison of tools used in natural language preprocessing. Read the rest below. Table of Links Abstract and 1. Introduction and related works Abstract and 1. Introduction and related works NLPre benchmarking NLPre benchmarking 2.1. Research concept 2.1. Research concept 2.2. Online benchmarking system 2.2. Online benchmarking system 2.3. Configuration 2.3. Configuration NLPre-PL benchmark NLPre-PL benchmark 3.1. Datasets 3.1. Datasets 3.2. Tasks 3.2. Tasks Evaluation Evaluation 4.1. Evaluation methodology 4.1. Evaluation methodology 4.2. Evaluated systems 4.2. Evaluated systems 4.3. Results 4.3. Results Conclusions Appendices Acknowledgements Bibliographical References Language Resource References Conclusions Appendices Acknowledgements Bibliographical References Language Resource References Conclusions Appendices Acknowledgements Bibliographical References Language Resource References Appendices Acknowledgements Bibliographical References Language Resource References 2. NLPre benchmarking 2.1. Research concept In this study, we introduce a novel adaptation of the benchmarking approach to NLPre. The primary objective is to establish an automated and credible method for evaluating NLPre systems against a provided benchmark and continuously updating their performance ranking on a publicly accessible scoreboard. More specifically, predictions for the benchmark test sets output by NLPre systems and 5 https://nlpre-pl.clarin-pl.eu 6 https://nlpre-zh.clarin-pl.eu 7 https://nlpre-ga.clarin-pl.eu submitted to the benchmarking system are automatically compared against the publicly undisclosed reference dataset. This method effectively prevents result manipulation and ensures fairness of the final assessment. The second important methodological assumption is to enable the ongoing evaluation of new or upgraded NLPre systems to guarantee up-to-date and complete ranking. Consequently, the leaderboard can serve as a reliable point of reference for NLPre system developers. Based on these assumptions, we design and implement the language-centric and tagset-agnostic benchmarking system that enables comprehensive and credible evaluation, constitutes an up-to-date source of information on NLPre progress, and is fully configurable to facilitate building benchmarking systems for multiple languages. This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license. This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license. available on arxiv