paint-brush

This story draft by @textmining has not been reviewed by an editor, YET.

Creating Supplier-Centric Contract Records with XML Parsing and Data Joining

featured image - Creating Supplier-Centric Contract Records with XML Parsing and Data Joining
Text Mining HackerNoon profile picture
0-item

Table of Links

  1. Abstract and Introduction

  2. Domain and Task

    2.1. Data sources and complexity

    2.2. Task definition

  3. Related Work

    3.1. Text mining and NLP research overview

    3.2. Text mining and NLP in industry use

    3.3. Text mining and NLP for procurement

    3.4. Conclusion from literature review

  4. Proposed Methodology

    4.1. Domain knowledge

    4.2. Content extraction

    4.3. Lot zoning

    4.4. Lot item detection

    4.5. Lot parsing

    4.6. XML parsing, data joining, and risk indices development

  5. Experiment and Demonstration

    5.1. Component evaluation

    5.2. System demonstration

  6. Discussion

    6.1. The ‘industry’ focus of the project

    6.2. Data heterogeneity, multilingual and multi-task nature

    6.3. The dilemma of algorithmic choices

    6.4. The cost of training data

  7. Conclusion, Acknowledgements, and References

4.6. XML parsing, data joining, and risk indices development

So far, we have explained how we extract missing lot and item information from tender attachment documents. As per Figure 6, our workflow also parses tender and award XMLs to extract other structure information about contract awards. This information is then joined with the lot and item information to create supplier-centric contract award records. This is a relatively straightforward process that we will only briefly explain here. For each award XML, one contract award record is created for a unique supplier-buyer pair. The supplier information is extracted from the award XML; while the buyer in the award XML is matched to that in the tender XML through name matching. This allows us to integrate additional buyer information from the tender XML. Contractual terms such as the start and end dates and lot quantity and values are also available from the award XML. However, where the award XML has missing lot and item information, we use the lot references to map to the extracted lot and item information from the tender attachment documents. Through this process, we populate our database with the schema partially shown in Figure 4.


Using this database, we can retrieve a particular supplier and its contract history for particular products. We can also use this information to calculate quantitative ‘indices’ that summarise individual suppliers’ risks from a buyer point of view. We introduce a number of metrics we developed in the current proof-of-concept. Once again, these are proprietary and therefore we focus on explaining the intuitions behind instead of revealing the mathematical calculations.


In total, we define 21 metrics that can be organised from two angles. For the first, we divide metrics into two types: one that measures a supplier’s ability to supply different ranges of products, the other that measures a supplier’s economic risk, taking into account their historical contract capacity and currently ‘active’ contracts. For the second, we define five sub-groups: ‘Product Metric (PM)’, ‘Buyer Metric (BM)’, ‘Location (LocM)’, ‘Lot Metric (LM)’, and ‘Value Metric (VM)’. PMs measure the ability of a supplier to supply a number of different products, or indications of ‘divergence’ of their portfolio. These monitor each year’s performance by taking into account their product portfolios and contract fulfilment track record. Factors included are, for example, the number of different types of products they supply each year; how this changes over time (increasing, or decreasing to a smaller number of products); quantity of specific types of products delivered to buyers. BMs measure the extent to which suppliers are generally able to retain their customers (e.g., churn and retention). LocMs measure the supplier’s ability to sell goods across countries and therefore, considers geographical coverage by their historical contracts. LMs measure at tender lot level the suppliers’ ability to participate in multiple contracts over time. This accounts for contract durations and helps us infer supplier stock levels. VM measures financial capacity and different value propositions suppliers can offer and participate in. For example, these may look at the fluctuations in a supplier’s contract capacity to identify risks of inability to supply. When a supplier signs a contract with a value (or quantity) that is significantly beyond its ‘norm’ (e.g., based on the average and range of contract values). The intuition is that signing contracts significantly deviating from the ‘norm’ can represent a risk of inability to supply. We will demonstrate the end system that allows exploring these risk indices and supplier profiles in the next Section.


Authors:

(1) Ziqi Zhang*, Information School, the University of Sheffield, Regent Court, Sheffield, UKS1 4DP ([email protected]);

(2) Tomas Jasaitis, Vamstar Ltd., London ([email protected]);

(3) Richard Freeman, Vamstar Ltd., London ([email protected]);

(4) Rowida Alfrjani, Information School, the University of Sheffield, Regent Court, Sheffield, UKS1 4DP ([email protected]);

(5) Adam Funk, Information School, the University of Sheffield, Regent Court, Sheffield, UKS1 4DP ([email protected]).


This paper is available on arxiv under CC BY 4.0 license.