paint-brush
Code Book for Annotation of Diverse Cross-Document Coreference: Annotation Guidelinesby@mediabias
424 reads
424 reads

Code Book for Annotation of Diverse Cross-Document Coreference: Annotation Guidelines

tldt arrow

Too Long; Didn't Read

This paper presents a scheme for annotating coreference across news articles, extending beyond traditional identity relations.
featured image - Code Book for Annotation of Diverse Cross-Document Coreference: Annotation Guidelines
Tech Media Bias [Research Publication] HackerNoon profile picture

This paper is available on arxiv under CC 4.0 license.

Authors:

(1) Jakob Vogel, M.A. Digital Humanities, Institute for Digital Humanities, Faculty of Philosophy, Georg August University of G¨ottingen.

5. Conclusion and future work

Our proposed annotation scheme covers a multitude of coreferential relations. It gives a detailed explanation of how to mark coreferential mentions across documents, assign entity-types and names to them, connect them with each other, and link them to the Wikidata knowledge graph. The scheme thus represents a significant step toward more accurately capturing the complexities of coreference use. It furthermore provides a valuable resource for researchers both in the field of coreference resolution and media bias by word-choice and labelling. Having said that, our scheme leaves room for possible extensions to further advance research in those domains. First, the annotation of events could be included in our scheme. An interesting question that arises is whether the relationtypes as outlined here could be applied not only to entities, but to events all the same. A second possible extension would be to include a layer of media bias annotation to the scheme, enabling a direct comparison of diverse coreference usage and media bias by word-choice and labelling. Both proposed extensions could be easily added on top of our scheme. Having said that, the present form of our scheme already addresses many of the complexities of diverse cross-document coreference and offers a roadmap for capturing nuanced linguistic relationships, ultimately advancing our understanding of language and discourse in digital print media.