paint-brush
Reactome Knowledgebase: Gene and Pathway Membership in Disease-Related Pathwaysby@largemodels

Reactome Knowledgebase: Gene and Pathway Membership in Disease-Related Pathways

by Large Models (dot tech)December 12th, 2024
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

This section explains Reactome's structure, focusing on the "Disease" superpathway, which includes pathways like EGFR signaling in cancer. The relationships between pathways are organized in a hierarchical burst structure, and the dictionary of entity-names and predicates is available on GitHub.
featured image - Reactome Knowledgebase: Gene and Pathway Membership in Disease-Related Pathways
Large Models (dot tech) HackerNoon profile picture
  1. Abstract and Introduction
  2. SylloBio-NLI
  3. Empirical Evaluation
  4. Related Work
  5. Conclusions
  6. Limitations and References


A. Formalization of the SylloBio-NLI Resource Generation Process

B. Formalization of Tasks 1 and 2

C. Dictionary of gene and pathway membership

D. Domain-specific pipeline for creating NL instances and E Accessing LLMs

F. Experimental Details

G. Evaluation Metrics

H. Prompting LLMs - Zero-shot prompts

I. Prompting LLMs - Few-shot prompts

J. Results: Misaligned Instruction-Response

K. Results: Ambiguous Impact of Distractors on Reasoning

L. Results: Models Prioritize Contextual Knowledge Over Background Knowledge

M Supplementary Figures and N Supplementary Tables

C Dictionary of gene and pathway membership

Reactome[4] (version 88—March 2024) has entries for 11 226 protein-coding genes involved in 15 212 human reactions annotated from 38 549 literature references. These reactions are grouped into 2 698 pathways collected under 29 superpathways (e.g. Immune System) that describe normal cellular functions. Each superpathway is represented as a roughly circular ‘burst,’ with the central node corresponding to the top-level of the Reactome event hierarchy and concentric rings representing increasingly specific levels of the event hierarchy (sub-pathways) (e.g. Disease → Diseases of signal transduction by growth factor receptors and second messengers → Signalling by EGFR in Cancer → Signalling by Ligand-Responsive EGFR Variants in Cancer → Constitutive Signalling by Ligand-Responsive EGFR Cancer Variants). The relationships between these pathways are captured through parent-child arcs, reflecting the ontological "is-a" relationships.


The 29 Reactome superpathways group are each organized as a roughly circular ‘burst’. However, we built the corpus based on one, largest group of pathways called Disease. The central node of the Disease burst corresponds to the uppermost level of the Reactome event hierarchy (Table 2). Concentric rings of nodes around the central node represent successive more specific levels of the event hierarchy (e.g. Disease → Diseases of signal transduction by growth factor receptors and second messengers → Signalling by EGFR in Cancer → Signalling by Ligand-Responsive EGFR Variants in Cancer → Constitutive Signalling by Ligand-Responsive EGFR Cancer Variants). The arcs connecting nodes between successive rings within a burst represent parent–child (is-a) relationships in the event hierarchy. When a specific pathway is shared by more than one burst, arcs connect its nodes between bursts. A node’s size is proportional to the number of physical entities (proteins, complexes, chemicals) it contains.


The dictionary of entity-names and predicates based on Reactome knowledgebase are available on GitHub.


Table 2: Summary of a dictionary with the true taxonomic relationships between pathways and genes.


Figure 6: Syllogistic argument schemes used to create a biologically factual argument corpus with domain-specific examples for generalized modus ponens base scheme and disjunctive syllogism complex predicates scheme. For each syllogistic scheme, a formal argument scheme (consisted of premises and conclusion (bold)) was provided.


Authors:

(1) Magdalena Wysocka, National Biomarker Centre, CRUK-MI, Univ. of Manchester, United Kingdom;

(2) Danilo S. Carvalho, National Biomarker Centre, CRUK-MI, Univ. of Manchester, United Kingdom and Department of Computer Science, Univ. of Manchester, United Kingdom;

(3) Oskar Wysocki, National Biomarker Centre, CRUK-MI, Univ. of Manchester, United Kingdom and ited Kingdom 3 I;

(4) Marco Valentino, Idiap Research Institute, Switzerland;

(5) André Freitas, National Biomarker Centre, CRUK-MI, Univ. of Manchester, United Kingdom, Department of Computer Science, Univ. of Manchester, United Kingdom and Idiap Research Institute, Switzerland.


This paper is available on arxiv under CC BY-NC-SA 4.0 license.

[4] https://reactome.org