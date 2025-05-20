Authors: (1) Rasoul Samani, School of Electrical and Computer Engineering, Isfahan University of Technology and this author contributed equally to this work; (2) Mohammad Dehghani, School of Electrical and Computer Engineering, University of Tehran, Tehran, Iran and this author contributed equally to this work ([email protected]); (3) Fahime Shahrokh, School of Electrical and Computer Engineering, Isfahan University of Technology.

Abstract and 1. Introduction

2. Related Works

3. Methodology and 3.1 Data

3.2 Data preprocessing

3.3. Predictive models

4. Evaluation

4.1. Evaluation metrics

4.2. Results and discussion

5. Conclusion and References

In binary classification tasks, data instances are typically classified as either positive or negative. A positive label signifies the presence of readmission, while a negative instance indicates no-readmission. Each binary label prediction can be categorized into one of four possibilities: a true positive (TP) occurs when a positive outcome is correctly predicted, a true negative (TN) happens when a negative outcome is correctly predicted, a false positive (FP) arises when a negative instance is wrongly predicted as positive, and a false negative (FN) occurs when a positive instance is incorrectly predicted as negative [45].





The primary evaluation metrics for binary classification are accuracy, precision, recall, and F1-score. Accuracy represents the percentage of correctly classified instances among all instances (Equation 1). Precision measures the proportion of instances classified as positive among all instances predicted as positive (Equation 2). Recall, also known as sensitivity, assesses the ability of the model to identify all truly positive instances (Equation 3). Finally, the F1-score is a harmonic mean of precision and recall, providing a balanced assessment of the model's performance (Equation 4) [46].









Two additional valuable metrics include ROC (Receiver Operating Characteristic) and AUC. The ROC curve is constructed by plotting the true positive rate against the false positive rate. This curve consistently increases within the unit square, bounded by the points (0, 0) and (1, 1) [47]. In addition to the ROC curve, the area under it (AUC) serves as another valuable evaluation metric. This metric spans from 0 to 1, providing insight into the overall performance of the classification model [48].





The dataset initially comprised 51,113 records, which underwent preprocessing resulting in 49,083 records. All these records were utilized to construct classification models. It's noteworthy that the dataset exhibits a high degree of imbalance, and to maintain realism and promote better generalization, no balancing techniques were applied. The data was then partitioned into three subsets: 70% for training, amounting to 34,358 records, 15% for validation, containing 7,363 records, and the remaining 15% for testing, also consisting of 7,362 records. Table 1 provide the distribution of each class.

















Table 2 presents the results obtained from various classifiers employed in our study. Notably, the Final Method, which combines the BDSS model with MLP, outperformed the state-of-the-art models in terms of AUC. Furthermore, this model, along with logistic regression, achieved the highest accuracy, recall, and F1-score, underscoring the continued relevance of machine learning models. In Figure 5, the ROC curve illustrates the performance of different models, with the Final Method achieving an impressive AUC of 75%, surpassing all other models. Remarkably, logistic regression exhibited superior performance with a rate of 73.2%, outperforming alternative machine learning techniques.





In the medical domain, metrics like recall and AUC play a crucial role in evaluating AI models. Recall, which measures the ability of a model to correctly identify positive cases, is particularly important in healthcare settings where identifying all potential cases is paramount. Similarly, AUC provides an overall measure of model performance and is widely used for assessing predictive models in medical applications. The Final Method, leveraging the BDSS model, is considered the best model due to its superior performance in terms of recall and AUC. This model is trained on discharge summaries data and harnesses the power of BDSS, which is pre-trained on a large corpus of text data and is adept at understanding the semantic nuances of text.





























One advantage of the logistic regression model is its clarity and interpretability. To gain insights into the model's decision-making process, we extracted and presented the features that exerted the most significant impact on the outcomes, as shown in Figure 6. As observed, words such as "milliliter," "mg," and "chronic" had the greatest influence on categorizing patients as readmitted. This can be attributed to the prescription of various drugs with specific doses by the medical practitioners during the patient's discharge. The higher the number of prescribed drugs, the higher the likelihood of patient readmission. Conversely, the presence of words like "without," "family," "negative," "normal," and "transferred" in the patient's discharge text had the most substantial impact on categorizing patients as non-returning to the hospital.

















Several previous studies have investigated models for predicting ICU readmission, with logistic regression consistently demonstrating favorable results, achieving AUC rates of 65% [49], 66% [50], and 70% [51]. However, a recent study by Orangi-Fard et al. [41], utilized various machine learning techniques on the MIMIC-III dataset to predict patient readmission. Their SVM-RBF model achieved an AUC rate of 74%. It's worth noting that Orangi-Fard et al. only utilized a portion of the dataset (4000 for training and 6000 for validation), balanced data, and employed 825 features. In contrast to previous approaches, our study took a comprehensive approach by utilizing the entire dataset, including imbalanced data. Additionally, we focused solely on textual features, omitting other factors such as demographics. This deliberate choice allowed us to gain deeper insights into the specific aspects we aimed to explore. Furthermore, while previous studies solely relied on machine learning models, our study also incorporated deep learning methods. This highlights the novelty and potential advantages of leveraging deep learning techniques in predicting ICU readmission. Table 3 provide a comparison with existing methods based on AUC metric.













5. Conclusion

Medical data, particularly EHR data, presents a rich source for text mining studies. These studies hold promise in various healthcare applications. Reducing ICU readmission rates is paramount for hospitals to enhance patient outcomes, conserve ICU resources, and curtail healthcare expenses. In this study, we aimed to leverage patient discharge reports, which offer detailed insights into a patient's medical history, current condition, and treatment recommendations, to develop a predictive model for ICU readmission. Our proposed deep learning-based model demonstrated superior performance compared to traditional machine learning models, achieving higher AUC. For future research, exploring alternative deep learning architectures beyond MLP could be beneficial. Additionally, Large Language Models (LLM) can be considered for creating predictive models and conducting comparative analyses with deep learning models. To enhance their effectiveness, we recommend considering the use of larger input data and leveraging advanced models like the LongFormer. Additionally, incorporating summarization techniques during the pre-processing stage can further improve the quality of input data.

References

This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.



