Why Supervised Methods in NLP Struggle in Real-World Applications

Table of Links

Abstract and Introduction
Domain and Task

2.1. Data sources and complexity

2.2. Task definition
Related Work

3.1. Text mining and NLP research overview

3.2. Text mining and NLP in industry use

3.3. Text mining and NLP for procurement

3.4. Conclusion from literature review
Proposed Methodology

4.1. Domain knowledge

4.2. Content extraction

4.3. Lot zoning

4.4. Lot item detection

4.5. Lot parsing

4.6. XML parsing, data joining, and risk indices development
Experiment and Demonstration

5.1. Component evaluation

5.2. System demonstration
Discussion

6.1. The ‘industry’ focus of the project

6.2. Data heterogeneity, multilingual and multi-task nature

6.3. The dilemma of algorithmic choices

6.4. The cost of training data
Conclusion, Acknowledgements, and References

3.4. Conclusion from literature review

Our literature review shows that, despite significant research in the areas of text mining and NLP, there is a strong dominance by supervised methods built on well-curated data that do not transfer well to practical scenarios. This is partially reflected by the number of industrial text mining/NLP studies that incorporated rule-based methods and the use of domain lexicons, except a few areas (e.g., the legal domain) where high quality curated resources are abundant. The majority of industrial studies also look at single and sometimes simplified tasks, but do not report a full process in an end-to-end fashion, particularly with a lack of details on how data heterogeneity and inconsistency is dealt with by their methods. Further, no prior work has focused on the healthcare domain. Our work will address these gaps.

Authors:

(1) Ziqi Zhang*, Information School, the University of Sheffield, Regent Court, Sheffield, UKS1 4DP (Ziqi.Zhang@sheffield.ac.uk);

(2) Tomas Jasaitis, Vamstar Ltd., London (Tomas.Jasaitis@vamstar.io);

(3) Richard Freeman, Vamstar Ltd., London (Richard.Freeman@vamstar.io);

(4) Rowida Alfrjani, Information School, the University of Sheffield, Regent Court, Sheffield, UKS1 4DP (Rowida.Alfrjani@sheffield.ac.uk);

(5) Adam Funk, Information School, the University of Sheffield, Regent Court, Sheffield, UKS1 4DP (Adam.Funk@sheffield.ac.uk).

This paper is available on arxiv under CC BY 4.0 license.

Why Supervised Methods in NLP Struggle in Real-World Applications

Too Long; Didn't Read

Table of Links

3.4. Conclusion from literature review

About Author

TOPICS

THIS ARTICLE WAS FEATURED IN...

Categories

Trending Topics

Why Supervised Methods in NLP Struggle in Real-World Applications

Too Long; Didn't Read

Table of Links

3.4. Conclusion from literature review

About Author

TOPICS

THIS ARTICLE WAS FEATURED IN...

RELATED STORIES

Categories

Trending Topics