paint-brush
Creating a Systematic ESG Scoring System: Resultsby@carbonization
200 reads

Creating a Systematic ESG Scoring System: Results

Too Long; Didn't Read

This project aims to create a data-driven ESG evaluation system that can provide better guidance and more systemized scores by incorporating social sentiment.
featured image - Creating a Systematic ESG Scoring System: Results
Carbonization Process Evolution Publication HackerNoon profile picture

Authors:

(1) Aarav Patel, Amity Regional High School – email: [email protected];

(2) Peter Gloor, Center for Collective Intelligence, Massachusetts Institute of Technology and Corresponding author – email: [email protected].

5. Results

The Random Forest Regression model displayed the strongest overall results when tested on a holdout sample of 64 companies. The Random Forest Regression model had the strongest correlation with current S&P Global ESG scores with a statistically significant correlation coefficient of 26.1% and a mean absolute average error (MAAE) of 13.4% (Figure 5, 6). This means that the algorithm has a p-value of 0.0372 (<0.05), showing that it is well-calibrated to existing ESG solutions. On the other hand, while the other models have similar MAAE, they also have lower correlation coefficients that do not prove to be statistically significant (Figure 6). For example, Support Vector Regression algorithm had a correlation of 18.3% and MAAE of 13.7%, which results in a p-value of 0.148 (Figure 8). The XGBoost model had a correlation of 16.0% and MAAE of 14.7%, which results in a p-value of 0.207 (Figure 7). Finally, the K-Nearest Neighbors algorithm had a correlation of 13.2% and a MAAE of 14.0%, which is a p-value of 0.298 (Figure 9). However, all the algorithms had a similar MAAE that fell between 13%-15%, with the Random Forest model having the lowest at 13.4% (Figure 10). All the algorithms surpassed the MAAE criteria of 20.0%.


Figure 5: Mean Absolute Average Error of different machine-learning algorithms against S&P Global ESG score


Figure 6: R2  correlation of different machine-learning algorithms


Figure 7: XGBoost model predictions v actual scores (scale 0-100)


Figure 8: Support Vector Regression predictions v actual scores (scale 0-100)


Figure 9: K-Nearest Neighbor model predictions v actual scores (scale 0-100)


Figure 10: Random Forest model predictions v actual scores (scale 0-100)


This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.