Authors:
(1) Aarav Patel, Amity Regional High School – email: [email protected];
(2) Peter Gloor, Center for Collective Intelligence, Massachusetts Institute of Technology and Corresponding author – email: [email protected].
The Random Forest Regression model displayed the strongest overall results when tested on a holdout sample of 64 companies. The Random Forest Regression model had the strongest correlation with current S&P Global ESG scores with a statistically significant correlation coefficient of 26.1% and a mean absolute average error (MAAE) of 13.4% (Figure 5, 6). This means that the algorithm has a p-value of 0.0372 (<0.05), showing that it is well-calibrated to existing ESG solutions. On the other hand, while the other models have similar MAAE, they also have lower correlation coefficients that do not prove to be statistically significant (Figure 6). For example, Support Vector Regression algorithm had a correlation of 18.3% and MAAE of 13.7%, which results in a p-value of 0.148 (Figure 8). The XGBoost model had a correlation of 16.0% and MAAE of 14.7%, which results in a p-value of 0.207 (Figure 7). Finally, the K-Nearest Neighbors algorithm had a correlation of 13.2% and a MAAE of 14.0%, which is a p-value of 0.298 (Figure 9). However, all the algorithms had a similar MAAE that fell between 13%-15%, with the Random Forest model having the lowest at 13.4% (Figure 10). All the algorithms surpassed the MAAE criteria of 20.0%.
This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.