Authors:
(1) Mark Potanin, a Corresponding (authorpotanin.m.st@gmail.com);
(2) Andrey Chertok, (a.v.chertok@gmail.com);
(3) Konstantin Zorin, (berzqwer@gmail.com);
(4) Cyril Shtabtsovsky, (cyril@aloniq.com). Table of Links Abstract and 1. Introduction 2 Related works 3 Dataset Overview, Preprocessing, and Features 3.1 Successful Companies Dataset and 3.2 Unsuccessful Companies Dataset 3.3 Features 4 Model Training, Evaluation, and Portfolio Simulation and 4.1 Backtest 4.2 Backtest settings 4.3 Results 4.4 Capital Growth 5 Other approaches 5.1 Investors ranking model 5.2 Founders ranking model and 5.3 Unicorn recommendation model 6 Conclusion 7 Further Research, References and Appendix 3.1 Successful Companies Dataset In this research, a company is deemed successful if it achieves one of three outcomes: Initial Public Offering (IPO), Acquisition (ACQ), or Unicorn status (UNIC), the latter being defined as a valuation exceeding $1 billion. To assemble a list of successful companies, we initially filtered for IPOs with valuations above $500M or funds raised over $100M, yielding 363 companies. For acquisitions, we applied filters to eliminate companies with a purchase price below the maximum amount of funds raised or under $100M, resulting in 833 companies. To select unicorns, we searched for companies with a valuation above $1 billion, utilizing both Crunchbase data and an additional table of verified unicorns, which led to a total of 1074 unicorns. The final dataset contains a timeline of all crucial investment rounds leading to the success event (i.e., achieving unicorn status, IPO, or ACQ), with the index of this event specified in the success_round column. This approach ensures that the dataset accurately represents the history and progress of each successful company, facilitating effective analysis. 3.2 Unsuccessful Companies Dataset To supply the model with examples of ’unsuccessful’ companies, we collected a separate dataset. We excluded companies already present in the successful companies dataset by removing those that had IPO, ACQ, or UNIC flags. We also eliminated a considerable number of actual unicorns from the CrunchBase website [16] to avoid overlap. We excluded companies that have not attracted any rounds since 2016. Additionally, we excluded companies that are subsidiaries or parent companies of other entities. Furthermore, we used the jobs dataset to exclude companies that have hired employees since 2017. Additionally, we applied extra filters to exclude companies with valuation above $100 million, as they reside in the "gray area" of companies that may not be clearly categorized as successful or unsuccessful. By applying these filters, we constructed a dataset comprising 32,760 companies, denoted by the label ’0’ for unsuccessful, and 1,989 companies, denoted by the label ’1’ for successful. This paper is available on arxiv under CC 4.0 license. Authors: (1) Mark Potanin, a Corresponding (authorpotanin.m.st@gmail.com); (2) Andrey Chertok, (a.v.chertok@gmail.com); (3) Konstantin Zorin, (berzqwer@gmail.com); (4) Cyril Shtabtsovsky, (cyril@aloniq.com). Authors: Authors: (1) Mark Potanin, a Corresponding (authorpotanin.m.st@gmail.com); (2) Andrey Chertok, (a.v.chertok@gmail.com); (3) Konstantin Zorin, (berzqwer@gmail.com); (4) Cyril Shtabtsovsky, (cyril@aloniq.com). Table of Links Abstract and 1. Introduction Abstract and 1. Introduction 2 Related works 2 Related works 3 Dataset Overview, Preprocessing, and Features 3 Dataset Overview, Preprocessing, and Features 3.1 Successful Companies Dataset and 3.2 Unsuccessful Companies Dataset 3.1 Successful Companies Dataset and 3.2 Unsuccessful Companies Dataset 3.3 Features 3.3 Features 4 Model Training, Evaluation, and Portfolio Simulation and 4.1 Backtest 4 Model Training, Evaluation, and Portfolio Simulation and 4.1 Backtest 4.2 Backtest settings 4.2 Backtest settings 4.3 Results 4.3 Results 4.4 Capital Growth 4.4 Capital Growth 5 Other approaches 5.1 Investors ranking model 5.1 Investors ranking model 5.2 Founders ranking model and 5.3 Unicorn recommendation model 5.2 Founders ranking model and 5.3 Unicorn recommendation model 6 Conclusion 6 Conclusion 7 Further Research, References and Appendix 7 Further Research, References and Appendix 3.1 Successful Companies Dataset In this research, a company is deemed successful if it achieves one of three outcomes: Initial Public Offering (IPO), Acquisition (ACQ), or Unicorn status (UNIC), the latter being defined as a valuation exceeding $1 billion. To assemble a list of successful companies, we initially filtered for IPOs with valuations above $500M or funds raised over $100M, yielding 363 companies. For acquisitions, we applied filters to eliminate companies with a purchase price below the maximum amount of funds raised or under $100M, resulting in 833 companies. To select unicorns, we searched for companies with a valuation above $1 billion, utilizing both Crunchbase data and an additional table of verified unicorns, which led to a total of 1074 unicorns. The final dataset contains a timeline of all crucial investment rounds leading to the success event (i.e., achieving unicorn status, IPO, or ACQ), with the index of this event specified in the success_round column. This approach ensures that the dataset accurately represents the history and progress of each successful company, facilitating effective analysis. 3.2 Unsuccessful Companies Dataset To supply the model with examples of ’unsuccessful’ companies, we collected a separate dataset. We excluded companies already present in the successful companies dataset by removing those that had IPO, ACQ, or UNIC flags. We also eliminated a considerable number of actual unicorns from the CrunchBase website [16] to avoid overlap. We excluded companies that have not attracted any rounds since 2016. Additionally, we excluded companies that are subsidiaries or parent companies of other entities. Furthermore, we used the jobs dataset to exclude companies that have hired employees since 2017. Additionally, we applied extra filters to exclude companies with valuation above $100 million, as they reside in the "gray area" of companies that may not be clearly categorized as successful or unsuccessful. By applying these filters, we constructed a dataset comprising 32,760 companies, denoted by the label ’0’ for unsuccessful, and 1,989 companies, denoted by the label ’1’ for successful. This paper is available on arxiv under CC 4.0 license. This paper is available on arxiv under CC 4.0 license. available on arxiv

Part of HackerNoon's growing list of open-source research papers, promoting free access to academic material.

Unicorns vs Failures: Constructing Comprehensive Datasets for Predictive Modeling

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

A Comprehensive Analysis of Startup Predictive Models

Measuring Financial Success and Capital Growth in Startup Portfolios

Interview with Digilus Technologies LLC Founder and CEO, Saahil Kumar Chathrath

Digital Marketing Quotes To Keep You Motivated

Startup Success Prediction and VC Portfolio Simulation Using CrunchBase Data

How Machine Learning Is Changing Startup Predictions

A Comprehensive Analysis of Startup Predictive Models

Measuring Financial Success and Capital Growth in Startup Portfolios

Interview with Digilus Technologies LLC Founder and CEO, Saahil Kumar Chathrath

Digital Marketing Quotes To Keep You Motivated

Startup Success Prediction and VC Portfolio Simulation Using CrunchBase Data

How Machine Learning Is Changing Startup Predictions

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps