Authors:
(1) Mark Potanin, a Corresponding ([email protected]);
(2) Andrey Chertok, ([email protected]);
(3) Konstantin Zorin, ([email protected]);
(4) Cyril Shtabtsovsky, ([email protected]).
3 Dataset Overview, Preprocessing, and Features
3.1 Successful Companies Dataset and 3.2 Unsuccessful Companies Dataset
4 Model Training, Evaluation, and Portfolio Simulation and 4.1 Backtest
5 Other approaches
5.2 Founders ranking model and 5.3 Unicorn recommendation model
7 Further Research, References and Appendix
The primary output of the algorithm is the backtest Table 2, sorted by the time the company was added to the portfolio. The table includes an exit_reason column, which serves as the main metric for evaluating model quality on the backtest. This column can take on the following values:
• success: the company had a successful round (unicorn/acquisition/IPO), and we exited
• longtime: a negative case where we left the company because it didn’t have a successful event and had no rounds for two years
• STILL_IN: a gray area, mainly consisting of companies that were recently added to the backtest
Hence, an optimal backtest is characterized by the maximum quantity of successful companies and a minimal number of companies categorized as longtime. Table 2 (earlybird_last) is the basic configuration based on business requirements. We enter in the first rounds (B/C) and exit in the last round. However, the model may not work very well at the beginning of the backtest due to limited data for training. In the Table 3 (any_last) configuration, we can observe a large number of known unicorns simply because we allow the model to enter in later rounds.
This paper is available on arxiv under CC 4.0 license.