paint-brush
Mal-Where? How We Boosted Malware Detection to XG-ceptional Levelsby@obfuscation
143 reads New Story

Mal-Where? How We Boosted Malware Detection to XG-ceptional Levels

by Obfuscation
Obfuscation HackerNoon profile picture

Obfuscation

@obfuscation

Hiding meaning in plain sight, a tangled web of words...

March 16th, 2025
Read on Terminal Reader
Read this story in a terminal
Print this story
Read this story w/o Javascript
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

This paper presents a malware detection system where exceptional accuracy is achieved, and class imbalance is effectively addressed using ADASYN.

People Mentioned

Mention Thumbnail

Random Forest

@randomforest

Companies Mentioned

Mention Thumbnail
Abstract
Mention Thumbnail
CIC

Coin Mentioned

Mention Thumbnail
ION
featured image - Mal-Where? How We Boosted Malware Detection to XG-ceptional Levels
1x
Read by Dr. One voice-avatar

Listen to this story

Obfuscation HackerNoon profile picture
Obfuscation

Obfuscation

@obfuscation

Hiding meaning in plain sight, a tangled web of words and codes, concealing truths from prying eyes.

Learn More
LEARN MORE ABOUT @OBFUSCATION'S
EXPERTISE AND PLACE ON THE INTERNET.
0-item

STORY’S CREDIBILITY

Academic Research Paper

Academic Research Paper

Part of HackerNoon's growing list of open-source research papers, promoting free access to academic material.

Authors:

(1) S M Rakib Hasan, Department of Computer Science and Engineering, BRAC University, Dhaka, Bangladesh (sm.rakib.hasan@g.bracu.ac.bd);

(2) Aakar Dhakal, Department of Computer Science and Engineering, BRAC University, Dhaka, Bangladesh (aakar.dhakal@g.bracu.ac.bd).

Abstract and I. Introduction

II. Literature Review

III. Methodology

IV. Results and Discussion

V. Conclusion and Future Work, and References

IV. RESULTS AND DISCUSSION

From our experiments, we have achieved outstanding results on our malware detection system.


A. Binary Classification


Our trained model achieved 99.99% accuracy on the test set, detecting all the malware correctly. The result is shown in Fig.II However, all the models performed very well in the detection of potential malware. The results are tabulated in TABLE II

Fig. 2. Binary classification using Random Forest Classifier.

Fig. 2. Binary classification using Random Forest Classifier.


B. Malware Classification


As the dataset is highly imbalanced, we conducted this part in 3 steps. First, we conducted the experiment on the original dataset, then undersampled the majority class and later oversampled the minority classes.


TABLE IIBINARY CLASSIFICATION PERFORMANCE

TABLE IIBINARY CLASSIFICATION PERFORMANCE


1) Classification on Original Data: Here, we ran the untouched data through our chosen algorithms and achieved moderate results. Although the metrics are not as impressive as the binary classification, it is mentionable that, no malware was classified safe, rather, different malwares were classified wrong. Our result is tabulated in TABLE III. From the results, it is seen that the XGBoost classifier performed the best in the detection and classification of malware.


TABLE IIICLASSIFICATION ON ORIGINAL DATA

TABLE IIICLASSIFICATION ON ORIGINAL DATA


2) Undersampling Majority Class: We have used four types of undersampling methods and trained our models on all of them. We got different performance metrics for different undersampling methods. No single method could dominate the scores. However, Random Undersampling and Near Miss approaches performed better than the other two methods. These results are tabulated in TABLE IV. From the results, we can see, that the XGBoost Classifier also performed better in this case while the Random Forest Classifier was really close. In this approach too, no malware was labeled safe during detection.


3) Oversampling Minority Classes: Among the popular oversampling methods, we choose ADASYN(Adaptive Synthetic Sampling). It is a data augmentation technique primarily used in imbalanced classification tasks. After applying ADASYN to all the minority classes separately, we balanced the dataset and applied our chosen classification algorithms. We got our best results with this approach. The findings are tabulated in TABLE V


Here also, XGBoost outperformed the other classifiers and provided the best predictions. The detection is shown in the Fig.3


Therefore, we see that our malware detection models are well-performing and robust. It can perfectly detect any potential malware through memory dump analysis as we conduct binary classification. In classifying the malware, among the explored approaches, the application of ADASYN emerged as the most promising solution. By systematically addressing the class imbalance through synthetic data generation, we achieved superior results compared to both the original format classification and the undersampling techniques. The outcomes of our experiments underscore the importance of tailored


TABLE IVCLASSIFICATION ON UNDERSAMPLED DATA

TABLE IVCLASSIFICATION ON UNDERSAMPLED DATA


TABLE VPERFORMANCE ON OVERSAMPLED DATA

TABLE VPERFORMANCE ON OVERSAMPLED DATA


Fig. 3. XGBoost performance in detection.

Fig. 3. XGBoost performance in detection.


strategies for handling class imbalance and reaffirm the potential of advanced techniques like ADASYN in enhancing multiclass classification accuracy.


V. CONCLUSION AND FUTURE WORK

In conclusion, our research addresses the rising threat of obfuscated malware in connected devices and the internet landscape. Through memory dump analysis and diverse machine learning algorithms, we’ve explored effective detection strategies and illuminated their strengths and limitations using the CIC-MalMem-2022 dataset. Emphasizing the synergy between machine learning and traditional security methods, our work underscores the need for a comprehensive defense strategy in the dynamic cybersecurity realm. While acknowledging the ever-evolving malware landscape, our research lays the groundwork for future endeavours, advocating continuous adaptation. Future efforts should focus on refining algorithms, exploring new data sources, and fostering interdisciplinary collaboration. We envision research on hybrid approaches, combining machine learning and signature-based methods, and studying the impact of adversarial attacks and explainable AI to enhance detection system robustness and transparency. In summary, our study provides valuable insights for resilient cybersecurity solutions, addressing the challenges of obfuscated malware and advancing detection capabilities to safeguard digital ecosystems against emerging threats.

REFERENCES

[1] Z. Chen, E. Brophy, and T. Ward, “Malware classification using static disassembly and machine learning,” arXiv preprint arXiv:2201.07649, 2021.


[2] M. Ahmadi, D. Ulyanov, S. Semenov, M. Trofimov, and G. Giacinto, “Novel feature extraction, selection and fusion for effective malware family classification,” in Proceedings of the sixth ACM conference on data and application security and privacy, 2016, pp. 183–194.


[3] I. You and K. Yim, “Malware obfuscation techniques: A brief survey,” in 2010 International conference on broadband, wireless computing, communication and applications. IEEE, 2010, pp. 297–300.


[4] T. Kim, B. Kang, M. Rho, S. Sezer, and E. G. Im, “A multimodal deep learning method for android malware detection using various features,” IEEE Transactions on Information Forensics and Security, vol. 14, no. 3, pp. 773–788, 2018.


[5] A. Bacci, A. Bartoli, F. Martinelli, E. Medvet, F. Mercaldo, C. A. Visaggio et al., “Impact of code obfuscation on android malware detection based on static and dynamic analysis.” in ICISSP, 2018, pp. 379–385.


[6] O. A. Aslan and R. Samet, “A comprehensive review on malware ¨ detection approaches,” IEEE access, vol. 8, pp. 6249–6271, 2020.


[7] G. Wagener, R. State, and A. Dulaunoy, “Malware behaviour analysis,” Journal in Computer Virology, vol. 4, pp. 279–287, 11 2008.


[8] Y. Fukushima, A. Sakai, Y. Hori, and K. Sakurai, “A behavior based malware detection scheme for avoiding false positive,” 11 2010, pp. 79 – 84.


[9] M. Chandramohan, H. B. K. Tan, L. C. Briand, L. K. Shar, and B. M. Padmanabhuni, “A scalable approach for malware detection through bounded feature space behavior modeling,” in Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE Computer Society, November 2013, pp. 312– 322.


[10] T. Carrier, P. Victor, A. Tekeoglu, and A. H. Lashkari, “Detecting obfuscated malware using memory feature engineering,” in The 8th International Conference on Information Systems Security and Privacy (ICISSP), 2022.


[11] L. Breiman, “Random forests,” Machine learning, vol. 45, no. 1, pp. 5–32, 2001.


[12] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” Nature, vol. 323, no. 6088, pp. 533–536, 1986.


[13] T. M. Cover and P. E. Hart, “Nearest-neighbor pattern classification,” IEEE transactions on information theory, vol. 13, no. 1, pp. 21–27, 1967.


[14] T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” in Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, 2016, pp. 785–794.


[15] R. Alejo, J. M. Sotoca, R. M. Valdovinos, and P. Toribio, “Edited nearest neighbor rule for improving neural networks classifications,” in Advances in Neural Networks - ISNN 2010, L. Zhang, B.-L. Lu, and J. Kwok, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010, pp. 303–310.


[16] C. Jiang, J. Song, G. Liu, L. Zheng, and W. Luan, “Credit card fraud detection: A novel approach using aggregation strategy and feedback mechanism,” IEEE Internet of Things Journal, pp. 1–1, 2018.


[17] H. He, Y. Bai, E. A. Garcia, and S. Li, “Adasyn: Adaptive synthetic sampling approach for imbalanced learning,” in 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), 2008, pp. 1322–1328.


This paper is available on arxiv under CC BY-SA 4.0 DEED license.


L O A D I N G
. . . comments & more!

About Author

Obfuscation HackerNoon profile picture
Obfuscation@obfuscation
Hiding meaning in plain sight, a tangled web of words and codes, concealing truths from prying eyes.

TOPICS

THIS ARTICLE WAS FEATURED IN...

Arweave
Read on Terminal Reader
Read this story in a terminal
 Terminal
Read this story w/o Javascript
Read this story w/o Javascript
 Lite
Hackernoon
X
Threads
Bsky

Mentioned in this story

coins
profiles
X REMOVE AD