Table of Links
-
SymTax Model
-
Analysis
A. Appendix
We conduct another quantitative analysis using the section heading as an additional signal in our reranking module.
A.1 Additional Experiment
We concatenate the section heading with query context in reranker and run our two SymTax variants. From Table 6, we can observe that using section heading leads to a significant performance drop in SciBERT_vector for all the metrics. However, for SPECTER_graph, the overall performance remains nearly the same. Both of these patterns clearly indicate that using section heading as a feature acts as a noise, and thus the citation contexts are already rich. Since our proposed dataset contains this additional feature, it is suitable for two additional tasks: context-specific citation generation (Wang et al., 2022), and section heading prediction for a given citation context.
A.2 Implementation Details
A.3 Datasets
ACL-200. This dataset contains papers published at ACL venues. It is a processed version of the ACL-ARC dataset created using ParsCit[12], a string parsing package based on conditional random field.
It contains citation contexts by considering ±200 characters around the citation placeholder.
FullTextPeerRead. It is an expansion of PeerRead dataset that contains the peer reviews of papers submitted to top venues in the Artificial Intelligence domain. So, FullTextPeerRead contains the citation contexts from the papers present in the PeerRead dataset.
RefSeer. This dataset is curated by extracting scientific articles belonging to various engineering domains. A citation excerpt is taken as the text of ±200 characters around the citation marker. It is a large dataset that contains 3.7 million citation contexts.
arXiv (HAtten). It is created using arXiv papers from a large and diverse corpus of scientific articles contained in S2ORC[13]. For every paper having its full text available, a citation excerpt is considered if the cited paper is also present in the arXiv database. Following the similar trend setup by ACL-200 and RefSeer, this dataset is also curated by considering the words in the ±200 character window around the citation marker.
Authors:
(1) Karan Goyal, IIIT Delhi, India ([email protected]);
(2) Mayank Goel, NSUT Delhi, India ([email protected]);
(3) Vikram Goyal, IIIT Delhi, India ([email protected]);
(4) Mukesh Mohania, IIIT Delhi, India ([email protected]).
This paper is
[12] https://github.com/knmnyn/ParsCit
[13] https://github.com/allenai/s2orc