Table of Links Abstract and Introduction


Domain and Task
2.1. Data sources and complexity
2.2. Task definition


Related Work
3.1. Text mining and NLP research overview
3.2. Text mining and NLP in industry use
3.3. Text mining and NLP for procurement
3.4. Conclusion from literature review


Proposed Methodology
4.1. Domain knowledge
4.2. Content extraction
4.3. Lot zoning
4.4. Lot item detection
4.5. Lot parsing
4.6. XML parsing, data joining, and risk indices development


Experiment and Demonstration
5.1. Component evaluation
5.2. System demonstration


Discussion
6.1. The ‘industry’ focus of the project
6.2. Data heterogeneity, multilingual and multi-task nature
6.3. The dilemma of algorithmic choices
6.4. The cost of training data


Conclusion, Acknowledgements, and References 4.6. XML parsing, data joining, and risk indices development So far, we have explained how we extract missing lot and item information from tender attachment documents. As per Figure 6, our workflow also parses tender and award XMLs to extract other structure information about contract awards. This information is then joined with the lot and item information to create supplier-centric contract award records. This is a relatively straightforward process that we will only briefly explain here. For each award XML, one contract award record is created for a unique supplier-buyer pair. The supplier information is extracted from the award XML; while the buyer in the award XML is matched to that in the tender XML through name matching. This allows us to integrate additional buyer information from the tender XML. Contractual terms such as the start and end dates and lot quantity and values are also available from the award XML. However, where the award XML has missing lot and item information, we use the lot references to map to the extracted lot and item information from the tender attachment documents. Through this process, we populate our database with the schema partially shown in Figure 4. Using this database, we can retrieve a particular supplier and its contract history for particular products. We can also use this information to calculate quantitative ‘indices’ that summarise individual suppliers’ risks from a buyer point of view. We introduce a number of metrics we developed in the current proof-of-concept. Once again, these are proprietary and therefore we focus on explaining the intuitions behind instead of revealing the mathematical calculations. In total, we define 21 metrics that can be organised from two angles. For the first, we divide metrics into two types: one that measures a supplier’s ability to supply different ranges of products, the other that measures a supplier’s economic risk, taking into account their historical contract capacity and currently ‘active’ contracts. For the second, we define five sub-groups: ‘Product Metric (PM)’, ‘Buyer Metric (BM)’, ‘Location (LocM)’, ‘Lot Metric (LM)’, and ‘Value Metric (VM)’. PMs measure the ability of a supplier to supply a number of different products, or indications of ‘divergence’ of their portfolio. These monitor each year’s performance by taking into account their product portfolios and contract fulfilment track record. Factors included are, for example, the number of different types of products they supply each year; how this changes over time (increasing, or decreasing to a smaller number of products); quantity of specific types of products delivered to buyers. BMs measure the extent to which suppliers are generally able to retain their customers (e.g., churn and retention). LocMs measure the supplier’s ability to sell goods across countries and therefore, considers geographical coverage by their historical contracts. LMs measure at tender lot level the suppliers’ ability to participate in multiple contracts over time. This accounts for contract durations and helps us infer supplier stock levels. VM measures financial capacity and different value propositions suppliers can offer and participate in. For example, these may look at the fluctuations in a supplier’s contract capacity to identify risks of inability to supply. When a supplier signs a contract with a value (or quantity) that is significantly beyond its ‘norm’ (e.g., based on the average and range of contract values). The intuition is that signing contracts significantly deviating from the ‘norm’ can represent a risk of inability to supply. We will demonstrate the end system that allows exploring these risk indices and supplier profiles in the next Section. Authors:
(1) Ziqi Zhang*, Information School, the University of Sheffield, Regent Court, Sheffield, UKS1 4DP (Ziqi.Zhang@sheffield.ac.uk);
(2) Tomas Jasaitis, Vamstar Ltd., London (Tomas.Jasaitis@vamstar.io);
(3) Richard Freeman, Vamstar Ltd., London (Richard.Freeman@vamstar.io);
(4) Rowida Alfrjani, Information School, the University of Sheffield, Regent Court, Sheffield, UKS1 4DP (Rowida.Alfrjani@sheffield.ac.uk);
(5) Adam Funk, Information School, the University of Sheffield, Regent Court, Sheffield, UKS1 4DP (Adam.Funk@sheffield.ac.uk). This paper is available on arxiv under CC BY 4.0 license. Table of Links Abstract and Introduction Domain and Task
2.1. Data sources and complexity
2.2. Task definition Related Work
3.1. Text mining and NLP research overview
3.2. Text mining and NLP in industry use
3.3. Text mining and NLP for procurement
3.4. Conclusion from literature review Proposed Methodology
4.1. Domain knowledge
4.2. Content extraction
4.3. Lot zoning
4.4. Lot item detection
4.5. Lot parsing
4.6. XML parsing, data joining, and risk indices development Experiment and Demonstration
5.1. Component evaluation
5.2. System demonstration Discussion
6.1. The ‘industry’ focus of the project
6.2. Data heterogeneity, multilingual and multi-task nature
6.3. The dilemma of algorithmic choices
6.4. The cost of training data Conclusion, Acknowledgements, and References Abstract and Introduction Abstract and Introduction Abstract and Introduction Abstract and Introduction Domain and Task 2.1. Data sources and complexity 2.2. Task definition Domain and Task Domain and Task 2.1. Data sources and complexity 2.1. Data sources and complexity 2.2. Task definition 2.2. Task definition Related Work 3.1. Text mining and NLP research overview 3.2. Text mining and NLP in industry use 3.3. Text mining and NLP for procurement 3.4. Conclusion from literature review Related Work Related Work 3.1. Text mining and NLP research overview 3.1. Text mining and NLP research overview 3.2. Text mining and NLP in industry use 3.2. Text mining and NLP in industry use 3.3. Text mining and NLP for procurement 3.3. Text mining and NLP for procurement 3.4. Conclusion from literature review 3.4. Conclusion from literature review Proposed Methodology 4.1. Domain knowledge 4.2. Content extraction 4.3. Lot zoning 4.4. Lot item detection 4.5. Lot parsing 4.6. XML parsing, data joining, and risk indices development Proposed Methodology Proposed Methodology Proposed Methodology 4.1. Domain knowledge 4.1. Domain knowledge 4.2. Content extraction 4.2. Content extraction 4.3. Lot zoning 4.3. Lot zoning 4.4. Lot item detection 4.4. Lot item detection 4.5. Lot parsing 4.5. Lot parsing 4.6. XML parsing, data joining, and risk indices development 4.6. XML parsing, data joining, and risk indices development Experiment and Demonstration 5.1. Component evaluation 5.2. System demonstration Experiment and Demonstration Experiment and Demonstration 5.1. Component evaluation 5.1. Component evaluation 5.2. System demonstration 5.2. System demonstration Discussion 6.1. The ‘industry’ focus of the project 6.2. Data heterogeneity, multilingual and multi-task nature 6.3. The dilemma of algorithmic choices 6.4. The cost of training data Discussion Discussion 6.1. The ‘industry’ focus of the project 6.1. The ‘industry’ focus of the project 6.2. Data heterogeneity, multilingual and multi-task nature 6.2. Data heterogeneity, multilingual and multi-task nature 6.3. The dilemma of algorithmic choices 6.3. The dilemma of algorithmic choices 6.4. The cost of training data 6.4. The cost of training data Conclusion, Acknowledgements, and References Conclusion, Acknowledgements, and References Conclusion, Acknowledgements, and References Conclusion, Acknowledgements, and References 4.6. XML parsing, data joining, and risk indices development So far, we have explained how we extract missing lot and item information from tender attachment documents. As per Figure 6, our workflow also parses tender and award XMLs to extract other structure information about contract awards. This information is then joined with the lot and item information to create supplier-centric contract award records. This is a relatively straightforward process that we will only briefly explain here. For each award XML, one contract award record is created for a unique supplier-buyer pair. The supplier information is extracted from the award XML; while the buyer in the award XML is matched to that in the tender XML through name matching. This allows us to integrate additional buyer information from the tender XML. Contractual terms such as the start and end dates and lot quantity and values are also available from the award XML. However, where the award XML has missing lot and item information, we use the lot references to map to the extracted lot and item information from the tender attachment documents. Through this process, we populate our database with the schema partially shown in Figure 4. Using this database, we can retrieve a particular supplier and its contract history for particular products. We can also use this information to calculate quantitative ‘indices’ that summarise individual suppliers’ risks from a buyer point of view. We introduce a number of metrics we developed in the current proof-of-concept. Once again, these are proprietary and therefore we focus on explaining the intuitions behind instead of revealing the mathematical calculations. In total, we define 21 metrics that can be organised from two angles. For the first, we divide metrics into two types: one that measures a supplier’s ability to supply different ranges of products, the other that measures a supplier’s economic risk, taking into account their historical contract capacity and currently ‘active’ contracts. For the second, we define five sub-groups: ‘Product Metric (PM)’, ‘Buyer Metric (BM)’, ‘Location (LocM)’, ‘Lot Metric (LM)’, and ‘Value Metric (VM)’. PMs measure the ability of a supplier to supply a number of different products, or indications of ‘divergence’ of their portfolio. These monitor each year’s performance by taking into account their product portfolios and contract fulfilment track record. Factors included are, for example, the number of different types of products they supply each year; how this changes over time (increasing, or decreasing to a smaller number of products); quantity of specific types of products delivered to buyers. BMs measure the extent to which suppliers are generally able to retain their customers (e.g., churn and retention). LocMs measure the supplier’s ability to sell goods across countries and therefore, considers geographical coverage by their historical contracts. LMs measure at tender lot level the suppliers’ ability to participate in multiple contracts over time. This accounts for contract durations and helps us infer supplier stock levels. VM measures financial capacity and different value propositions suppliers can offer and participate in. For example, these may look at the fluctuations in a supplier’s contract capacity to identify risks of inability to supply. When a supplier signs a contract with a value (or quantity) that is significantly beyond its ‘norm’ (e.g., based on the average and range of contract values). The intuition is that signing contracts significantly deviating from the ‘norm’ can represent a risk of inability to supply. We will demonstrate the end system that allows exploring these risk indices and supplier profiles in the next Section. Authors: (1) Ziqi Zhang*, Information School, the University of Sheffield, Regent Court, Sheffield, UKS1 4DP (Ziqi.Zhang@sheffield.ac.uk); (2) Tomas Jasaitis, Vamstar Ltd., London (Tomas.Jasaitis@vamstar.io); (3) Richard Freeman, Vamstar Ltd., London (Richard.Freeman@vamstar.io); (4) Rowida Alfrjani, Information School, the University of Sheffield, Regent Court, Sheffield, UKS1 4DP (Rowida.Alfrjani@sheffield.ac.uk); (5) Adam Funk, Information School, the University of Sheffield, Regent Court, Sheffield, UKS1 4DP (Adam.Funk@sheffield.ac.uk). Authors: Authors: (1) Ziqi Zhang*, Information School, the University of Sheffield, Regent Court, Sheffield, UKS1 4DP (Ziqi.Zhang@sheffield.ac.uk); (2) Tomas Jasaitis, Vamstar Ltd., London (Tomas.Jasaitis@vamstar.io); (3) Richard Freeman, Vamstar Ltd., London (Richard.Freeman@vamstar.io); (4) Rowida Alfrjani, Information School, the University of Sheffield, Regent Court, Sheffield, UKS1 4DP (Rowida.Alfrjani@sheffield.ac.uk); (5) Adam Funk, Information School, the University of Sheffield, Regent Court, Sheffield, UKS1 4DP (Adam.Funk@sheffield.ac.uk). This paper is available on arxiv under CC BY 4.0 license. This paper is available on arxiv under CC BY 4.0 license. available on arxiv available on arxiv

Part of HackerNoon's growing list of open-source research papers, promoting free access to academic material.

Creating Supplier-Centric Contract Records with XML Parsing and Data Joining

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

A New Era for Procurement Text Mining

Using AI to Analyze Healthcare Procurement Documents and Assess Supplier Risks

How Healthcare Procurement Data is Being Used to Evaluate Supplier Reliability

How to Build Supplier Risk Profiles

How Text Mining Can Simplify the Complexities of Procurement Data

New Study Shows How Text Mining and NLP Transform Legal, E-commerce, and Construction Industries

A New Era for Procurement Text Mining

Using AI to Analyze Healthcare Procurement Documents and Assess Supplier Risks

How Healthcare Procurement Data is Being Used to Evaluate Supplier Reliability

How to Build Supplier Risk Profiles

How Text Mining Can Simplify the Complexities of Procurement Data

New Study Shows How Text Mining and NLP Transform Legal, E-commerce, and Construction Industries

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps