paint-brush
A Formalization of the SylloBio-NLI Resource Generation Processby@largemodels
New Story

A Formalization of the SylloBio-NLI Resource Generation Process

by Large ModelsDecember 11th, 2024
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

This appendix outlines the formal process for generating syllogistic inference patterns in the biomedical domain, from defining formal argument schemes to mapping domain-specific knowledge and constructing a knowledge base for evaluating NLI models.
featured image - A Formalization of the SylloBio-NLI Resource Generation Process
Large Models HackerNoon profile picture
  1. Abstract and Introduction
  2. SylloBio-NLI
  3. Empirical Evaluation
  4. Related Work
  5. Conclusions
  6. Limitations and References


A. Formalization of the SylloBio-NLI Resource Generation Process

B. Formalization of Tasks 1 and 2

C. Dictionary of gene and pathway membership

D. Domain-specific pipeline for creating NL instances and E Accessing LLMs

F. Experimental Details

G. Evaluation Metrics

H. Prompting LLMs - Zero-shot prompts

I. Prompting LLMs - Few-shot prompts

J. Results: Misaligned Instruction-Response

K. Results: Ambiguous Impact of Distractors on Reasoning

L. Results: Models Prioritize Contextual Knowledge Over Background Knowledge

M Supplementary Figures and N Supplementary Tables

A Formalization of the SylloBio-NLI Resource Generation Process

This appendix formalises the generation process of the syllogistic inference patterns.


We start by defining the mains constructs (formal and linguistic artefacts and functions) of the underlying framework:


  1. Syllogistic Scheme (S): A logical inference pattern consisting of premises and a conclusion, S = {P1, P2, . . . , Pn, C}, where Pi is premise i and C is the conclusion.


  2. Formal Argument Scheme (σ): Representation of a syllogistic scheme in first-order logic (FOL), σ(S) = {ϕ1, ϕ2, . . . , ϕn, ψ}, where ϕi corresponds to Pi and ψ corresponds to C.


  3. Natural Language Template (τ ): A natural language schema mapping each formula in σ(S) to a sentence template, τ (σ(S)) = {τ1, τ2, . . . , τn, σ}, where τi is the sentence template for ϕi and σ is the sentence template for ψ.


  4. Ontology (O): A domain-specific knowledge base containing entities E and predicates Π, O = {E, Π}, where E = {e1, e2, . . . , ek} and Π = {π1, π2, . . . , πl}.


  5. Instantiation Function (I): A function that replaces placeholders in τ with entities and predicates from O, I : τ (σ(S)) × O → NL, where NL is the set of natural language sentences.


  6. Expert Mapping Function (µExpert): A function provided by a domain expert to map placeholders to appropriate ontology terms, µExpert : Placeholders → E ∪ Π.


  7. Knowledge Base (KB): A collection of instantiated syllogistic arguments, KB = {A1, A2, . . . , Am}, where Ai = {P ′ 1 , P′ 2 , . . . , P′ n , C′} and P ′ i , C′ are instantiated natural language sentences.

A.1 Process Formalisation

The process formalisation defines a systematic process for generating domain-specific syllogistic arguments by:


1. Defining formal representations of syllogistic schemes in first-order logic.


  1. Generating natural language templates from these formal representations.


  2. Mapping placeholders to domain-specific entities and predicates using an ontology and expert knowledge.


  3. Instantiating the templates to produce logically valid and semantically sound arguments.


  4. Constructing a knowledge base for evaluating NLI models.


This ensures that the generated arguments are both logically valid and contextually relevant to the biomedical domain.


Input: A set of syllogistic schemes: S = {S1, S2, . . . , Sm}, an ontology: O = {E, Π}, an expert mapping function: µExpert.


Output: A knowledge base of instantiated arguments: KB.


Step 1: Formal Argument Scheme Selection: For each syllogistic scheme Si ∈ S, define its formal argument scheme in first-order logic:



Step 2: Natural Language Template Generation: Transform each formula in σ(Si) into a natural language template:



Step 3: Ontology Mapping and Instantiation: Apply the expert mapping function to select appropriate entities and predicates from the ontology:



Instantiate the templates:



under the following constraints:


• Logical Validity: The instantiated arguments must preserve the logical structure of σ(Si).


• Domain Soundness: The selected entities and predicates must be semantically coherent within the targeted subdomain.


These constraints can be further formalised as:


Logical Validity Constraint: The instantiated argument Ai must be logically valid:


{ϕ ′ 1 , ϕ′ 2 , . . . , ϕ′ n} |= ψ ′ ,


where ϕ ′ j corresponds to the logical form of P ′ j .


Domain Soundness Constraint: The entities and predicates used must be semantically valid within the domain:


∀e ∈ E ′ , π ∈ Π ′ , DomainValid(e, π) = True,


where E′ ⊆ E and Π′ ⊆ Π are entities and predicates used in Ai


Verification of Logical Validity: Ensure that the instantiated premises logically entail the conclusion:


{ϕ ′ 1 , ϕ′ 2 , . . . , ϕ′ n} |= ψ ′ ,


using logical inference rules.


Verification of Domain Soundness: Confirm that:


• All entities and predicates are correctly used.


• There are no semantic contradictions.


Step 4: Knowledge Base Construction: Aggregate all instantiated arguments into the knowledge base:



This is summarised with the following algorithmic outline:



Authors:

(1) Magdalena Wysocka, National Biomarker Centre, CRUK-MI, Univ. of Manchester, United Kingdom;

(2) Danilo S. Carvalho, National Biomarker Centre, CRUK-MI, Univ. of Manchester, United Kingdom and Department of Computer Science, Univ. of Manchester, United Kingdom;

(3) Oskar Wysocki, National Biomarker Centre, CRUK-MI, Univ. of Manchester, United Kingdom and ited Kingdom 3 I;

(4) Marco Valentino, Idiap Research Institute, Switzerland;

(5) André Freitas, National Biomarker Centre, CRUK-MI, Univ. of Manchester, United Kingdom, Department of Computer Science, Univ. of Manchester, United Kingdom and Idiap Research Institute, Switzerland.


This paper is available on arxiv under CC BY-NC-SA 4.0 license.