paint-brush
Startup Success Prediction and VC Portfolio Simulation Using CrunchBase Databy@exitstrategy

Startup Success Prediction and VC Portfolio Simulation Using CrunchBase Data

by ExitStrategyAugust 7th, 2024
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

This study introduces a deep learning model that leverages data from Crunchbase to predict startup success, including IPOs and unicorn status. By integrating diverse factors and employing a backtesting algorithm, the model achieved a 14x capital growth and an 86% ROC_AUC, demonstrating its effectiveness in real-world investment contexts.
featured image - Startup Success Prediction and VC Portfolio Simulation Using CrunchBase Data
ExitStrategy HackerNoon profile picture

Authors:

(1) Mark Potanin, a Corresponding ([email protected]);

(2) Andrey Chertok, ([email protected]);

(3) Konstantin Zorin, ([email protected]);

(4) Cyril Shtabtsovsky, ([email protected]).

Abstract and 1. Introduction

2 Related works

3 Dataset Overview, Preprocessing, and Features

3.1 Successful Companies Dataset and 3.2 Unsuccessful Companies Dataset

3.3 Features

4 Model Training, Evaluation, and Portfolio Simulation and 4.1 Backtest

4.2 Backtest settings

4.3 Results

4.4 Capital Growth

5 Other approaches

5.1 Investors ranking model

5.2 Founders ranking model and 5.3 Unicorn recommendation model

6 Conclusion

7 Further Research, References and Appendix

ABSTRACT

Predicting startup success presents a formidable challenge due to the inherently volatile landscape of the entrepreneurial ecosystem. The advent of extensive databases like Crunchbase jointly with available open data enables the application of machine learning and artificial intelligence for more accurate predictive analytics. This paper focuses on startups at their Series B and Series C investment stages, aiming to predict key success milestones such as achieving an Initial Public Offering (IPO), attaining unicorn status, or executing a successful Merger and Acquisition (M&A). We introduce novel deep learning model for predicting startup success, integrating a variety of factors such as funding metrics, founder features, industry category. A distinctive feature of our research is the use of a comprehensive backtesting algorithm designed to simulate the venture capital investment process. This simulation allows for a robust evaluation of our model’s performance against historical data, providing actionable insights into its practical utility in real-world investment contexts. Evaluating our model on Crunchbase’s, we achieved a 14 times capital growth and successfully identified on B round high-potential startups including Revolut, DigitalOcean, Klarna, Github and others. Our empirical findings illuminate the importance of incorporating diverse feature sets in enhancing the model’s predictive accuracy. In summary, our work demonstrates the considerable promise of deep learning models and alternative unstructured data in predicting startup success and sets the stage for future advancements in this research area.

1 Introduction

The prediction of startup success is a crucial task for various stakeholders, including investors, entrepreneurs, and policymakers, as it has significant implications for resource allocation and decision-making. It is estimated that approximately 90% of startups fail within their first five years, a failure rate that has remained relatively constant over the past few decades, despite considerable advancements in technology and business practices. Consequently, the accurate prediction of startup success can assist investors in more effectively allocating their resources and enable entrepreneurs to make better-informed decisions.


Recently, the proliferation of data from sources such as Crunchbase has intensified interest in the application of machine learning techniques for the prediction of startup success. Machine learning models can harness various types of data, encompassing funding history, market trends, team composition, and social media activity, to identify patterns and generate predictions.


This study presents two distinct methodologies for predicting startup success: a supervised deep learning approach leveraging multiple data sources, and a ranking-based approach focusing on the identification of characteristics common to successful startups and investors. The supervised approach entails collecting and labeling data, constructing a prediction model, and evaluating its performance. In contrast, the ranking-based approach centers on identifying startups and investors that exhibit shared characteristics with successful ones.


Our train dataset consists of 34,470 companies The primary novelty of this research lies in the application of deep learning techniques and the integration of heterogeneous input data types. A crucial feature of our research is the simulation of fund operations based on historical data, resulting in a projected 14x capital growth of the fund’s portfolio. As per machine learning metrics, our model exhibits a robust 86% ROC_AUC.


The remainder of this paper is organized as follows: Section 2 reviews the related works in the area of startup success prediction and machine learning. Section 3 describes dataset collection, preprocessing, and feature selection. Section 4 presents the experimental results of the supervised approach. Section 5 describes some other ideas about company and investor scoring. Finally, Sections 6 and 7 provide the conclusion of the study and discuss prospective research avenues in this domain.


This paper is available on arxiv under CC 4.0 license.