Model Design and Comparative Metrics for Bankruptcy Prediction in Imbalanced Datasets

Written by angelinvest | Published 2025/10/21
Tech Story Tags: automated-machine-learning | automl-for-financial-analysis | predictive-modeling | financial-risk-assessment | investment-grade-bonds | fallen-angel-bonds | feature-selection-in-finance | bankruptcy-prediction-models

TLDROversampling improved every model’s performance, with the neural network and AutoML achieving superior recall and F‑scores. via the TL;DR App

Authors:

(1) Harrison Mateika, Northwestern University ([email protected]);

(2) Juannan Jia, Northwestern University ([email protected]);

(3) Linda Lillard, Northwestern University ([email protected]);

(4) Noah Cronbaugh, Northwestern University ([email protected]);

(5) Will Shin, Northwestern University ([email protected]).

  1. Introduction
  2. Literature Review
  3. Data Collection
  4. Data Analysis
  5. Methodology
  6. Results
  7. Analysis and Interpretation
  8. Conclusions and Next Steps, and References

5. Methodology

We split the data set into 70/30 for training and testing. Different classifier algorithms were explored to build an optimal model.

To measure the success of a model, the confusion matrix was used to identify true positives, true negatives, false positives, and false negatives.

We gave heavy weight to true positives and true negatives since those metrics indicate the number of bankruptcies determined accurately and the number of non-bankruptcies determined accurately.

5.1 Logistic Regression (LR)

In the LR model performed on the original data set, 99% correctly labeled non-bankruptcy data as Non bankruptcy, and 16% correctly labeled the actual bankruptcy data as bankruptcy. The high degree of accuracy (99%) to predict the true non-bankruptcy event and the low degree of accuracy (16%) to predict the true bankruptcy event show the bias of how much non-bankruptcy data is in the set. The original data set's precision, recall, f1-score, and accuracy were 0.35, 0.16, and 0.22, respectively. This indicates that the model does a mediocre job of predicting bankruptcies.

In the LR model performed on the oversampled data set, 88% correctly labeled the actual non-bankruptcy data as the non-bankruptcy and 92% correctly labeled the actual bankruptcy data as bankruptcy, which showed the balanced prediction for the true positives and true negatives to avoid the bias produced in the first data set. The precision, recall, and f1-score were 0.89, 0.92, and 0.90. This is, of course, a massive improvement over the previous data set but can largely be attributed to the fact that the model may potentially be overfitted.

In the LR model that performed the oversampled data set with the feature selection, 89% correctly labeled the actual non-bankruptcy data as the non-bankruptcy, and 90% correctly labeled the actual bankruptcy data as a bankruptcy. The precision, recall, and f-score are 0.88, 0.89, and 0.89, respectively. Compared with the performance data from the previous data set, the feature selection did not improve the model performance significantly.

5.2 K-Nearest Neighbors (KNN)

In the KNN model that performed on the original data set, 100% correctly labeled the actual non-bankruptcy data as the non-bankruptcy and 12% correctly labeled the actual bankruptcy data as bankruptcy. The high degree of accuracy (almost 100%) to predict the true non bankruptcy event and low degree of accuracy (12%) to predict the true bankruptcy event raised the red flag of the bias issue again, like the Logistic Regression model. The original data set's precision, recall, and f1-score are 0.48, 0.16, and 0.24, respectively. In terms of the aggregate metric (the f1 score) this model did not perform as well as the logistic regression on the original data set. However, it did have a slightly higher precision score.

In the KNN model that performed on the oversampled data set, 88% correctly labeled the actual nonbankruptcy data as non-bankruptcy, and 100% correctly labeled the actual bankruptcy as bankruptcy. The precision, recall, and f1-score for the oversampled data were 0.89, 1, and 0.94, respectively. Like the logistic regression model, the oversampled data outperformed the original data set. Furthermore, this model outperformed the logistic regression in almost every metric.

In the KNN model that performed on the oversampled data set with the feature selection, 89% correctly labeled the actual non-bankruptcy data as non-bankruptcy, and 96.6% correctly labeled the actual bankruptcy as bankruptcy. The data set's precision, recall, and f1-score were 0.88, 0.94, and 0.9, respectively. Compared with the performance data from the previous data set, the model with the feature selection generated similar prediction results and did not improve the model performance significantly. This is similar to both of the previous model results.

5.3 Support Vector Model (SVM)

In the SVM model that performed on the original data set, 99.9% correctly labeled the actual non-bankruptcy data as the non-bankruptcy, and 1.5% correctly labeled the actual bankruptcy as the bankruptcy. The high degree of accuracy (almost 100%) to predict the true nonbankruptcy event and the low degree of accuracy (1.5%) to predict the true bankruptcy event detected the bias issue. The original data set’s precision, recall, and f-score are 0.33, 0.02, and 0.04. This model performed far worse than the previous two models, mainly due to its recall score.

In the SVM model that performed on the oversampled data set, 92% correctly labeled the actual non-bankruptcy data as the non-bankruptcy, and 100% correctly labeled the actual bankruptcy as bankruptcy, which showed the balanced prediction for the true positives and true negatives. The precision, recall, and f-score for the oversampled data set are 0.93, 0.98, and 0.95, respectively. This version of the model outperformed the logistic regression and the KNN model on all metrics.

This is fascinating since the original data set for this performed worse than the other two. Perhaps this model performs better as a whole with more information compared to the other two.

In the SVM model that performed on the oversampled data set with the feature selection, 88% correctly labeled the actual non-bankruptcy data as the non-bankruptcy, and 92.8% correctly labeled the actual non-bankruptcy as the non-bankruptcy. The data set’s precision, recall, and f-score are 0.88, 0.94, and 0.90, respectively. The model on this data set had an inferior performance compared to the other oversampled data set, following similar trends from the other two models.

5.4 Neural Network (NN) Model

We used Keras to perform a multi-layer feed-forward neural network model. The first hidden layer has 16 nodes and uses the “relu” activation function. The second hidden layer has 16 nodes and uses the relu activation function. The output layer has 1 node and uses the sigmoid activation function. Since our model is a binary classification problem, “binary_crossentropy” is used as the loss argument. The “epoch” value used was 120 with a batch_size of 10.

In the NN model that performed on the original data set, 99.9% correctly labeled the actual non-bankruptcy data as non-bankruptcy, and 1.9% correctly labeled the actual bankruptcy as a bankruptcy. Once again, the high degree of accuracy (almost 100%) to predict the true nonbankruptcy event and the low degree of accuracy (1.9%) to predict the true bankruptcy event generated skewness toward the non-bankruptcy data. The original data set’s precision, recall, and f-score are 0.33, 0.02, and 0.04, respectively. This model performed just as well as the SVM model on most metrics.

In the NN model that performed on the oversampled data set, 97% correctly labeled the actual non-bankruptcy data as non-bankruptcy, and 99% correctly labeled the actual bankruptcy as bankruptcy. The precision, recall, and f-score for the oversampled data set are 0.97, 0.99, and 0.98, respectively. This model on the oversampled data set is the best model regarding performance and has the highest precision, recall, and f score as a whole.

In the NN model that performed on the oversampled data set with the feature selection, 92.9% correctly labeled the actual non-bankruptcy data as the non-bankruptcy and 95.9% correctly labeled the actual non-bankruptcy as the non-bankruptcy. The precision, recall, and f scores were 0.93, 0.96, and 0.94, respectively. This model performed worse than how it performed on the oversampled data set. However, this model outperformed the other models regarding this data set.

5.5 Auto Machine Learning (AutoML)

Within the original data set, the AutoML model overall produced a model that may be considered the best or second best depending on how one looks at the results. If the aim is to predict the True Positive, the model AutoML performed second to the logistic regression model (.14 vs .15). However, if one looks at the precision of bankruptcies predicted, the model performed by far the best at a precision of .60. When looking at an aggregate metric though like the f score, the model performed the third best at 0.22. This is primarily due to the model’s lower recall score when compared to logistic and KNN. Overall, the model as a whole is fairly comparable to the other models and its precision rate is definitely not something to be overlooked.

When looking at the oversample results, the AutoML performed the best out of the rest of the models for most of the metrics. The AutoML model had the highest precision (0.98), is tied with KNN for the highest recall (1) and the highest f-score (0.99). The only metric where the AutoML is outperformed is on the True Negative where it is outperformed by the neural network model with AutoML having 0.98 and the neural network model having 0.99. Overall, since AutoML outperformed the other models regarding metrics, it may be the best performer.

When looking at the oversample with feature selection results, the AutoML model performed the best regarding every metric when compared to the other models. It had the highest precision (0.95), recall (1) and f score (0.98). Like the other models those scores were not as high as the score with all of the features included.

This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.


Written by angelinvest | Empowering visionary entrepreneurs, fueling innovation, and cultivating a brighter future through strategic investments.
Published by HackerNoon on 2025/10/21