This paper is available on arxiv under CC 4.0 license.
Authors:
(1) Mounika Vanamala, Department of Computer Science, University of Wisconsin-Eau Claire, United States;
(2) Keith Bryant, Department of Computer Science, University of Wisconsin-Eau Claire, United States;
(3) Alex Caravella, Department of Computer Science, University of Wisconsin-Eau Claire, United States.
Conclusions, Acknowledgment, and References
Upon recognizing the significance of cyber security vulnerability controls during the software requirement phase, the CAPEC software vulnerability repository emerged as the most practical repository for this study. The arrangement of attack patterns thus facilitates precise identification and seamless referral back to CAPEC for recommended defense strategies. We define and elaborate on topic modeling, as well as unsupervised and supervised ML methods, showcasing recent research instances and the applicability of these approaches. As our research continues, our efforts will involve the implementation of supervised machine learning. The CAPEC repository provides a prelabeled dataset, a valuable asset for training data set implementation. Supervised ML offers the added benefit of proficiently utilizing metrics to fine-tune the ML process, thus enabling thorough evaluation and process enhancement. A training set for the SRS document must either be crafted or located for supervised ML execution. Given the absence of a comparable research framework employing supervised ML, our future endeavors will assess and compare results stemming from Naïve Bayes and RF ML methodologies. Naïve Bayes showcases statistical prowess across both large and small data sets, making it suitable for the modest data set of SRS documents as well as the larger data set encompassing CAPEC Vulnerabilities. RF's capacity to counteract overfitting aligns well with the intricate data from CAPEC. The algorithm returning the most accurate recommendations for CAPEC attack patterns from an SRS document will be harnessed to deploy an automated tool for result processing and visualization.
Funding Information
Author’s Contributions
Keith Bryant and Alex Caravella: Acquisition of data and analysis and interpretation of data and content written.
Keith Bryant, Alex Caravella, and Mounika Vanamala: Conception and design of the article, intellectual content generation, critically reviewed the article.
Mounika Vanamala: Contribution to intellectual content ideation and reviewed the article along with the coordination for publication.
This article is original and contains unpublished material. The corresponding author confirms that all of the other authors have read and approved the manuscript and that no ethical issues are involved.
Al-Sabahi, K., Zuping, Z., & Kang, Y. (2018). Latent semantic analysis approach for document summarization based on word embeddings. arXiv preprint arXiv:1807.02748. https://doi.org/10.3837/tiis.2019.01.015
Alyami, H., Nadeem, M., Alharbi, A., Alosaimi, W., Ansari, M. T. J., Pandey, D., ... & Khan, R. A. (2021). The evaluation of software security through quantum computing techniques: A durability perspective. Applied Sciences, 11(24), 11784.
https://doi.org/10.3390/app112411784
Asim, M. N., Ghani, M. U., Ibrahim, M. A., Mahmood, W., Dengel, A., & Ahmed, S. (2021). Benchmarking performance of machine and deep learning-based methodologies for Urdu text document classification. Neural Computing and Applications, 33, 5437-5469. https://doi.org/10.1007/s00521-020-05321-8
Bedi, G. (2018). A guide to Text Classification (NLP) using SVM and Naive Bayes with Python. Medium, Nov.
Bellaouar, S., Bellaouar, M. M., & Ghada, I. E. (2021, February). Topic modeling: Comparison of LSA and LDA on scientific publications. In 2021 4th International Conference on Data Storage and Data Engineering (pp. 59-64). https://doi.org/10.1145/3456146.3456156
CISA. (2021). c? | CISA. https://www.cisa.gov/uscert/ncas/tips/ST04-001
CVE. (2022). https://cve.mitre.org
Delli, U., & Chang, S. (2018). Automated process monitoring in 3D printing using supervised machine learning. Procedia Manufacturing, 26, 865-870. https://doi.org/10.1016/j.promfg.2018.07.111
Guo, Y., & Li, J. (2021). Distributed Latent Dirichlet Allocation on Streams. ACM Transactions on Knowledge Discovery from Data (TKDD), 16(1), 1-20. https://doi.org/10.1145/3451528
Prasad, S. G., Badrinarayanan, M. K., & Sharmila, V. C. (2022). Efficacy and Security Effectiveness: Key Parameters in Evaluation of Network Security. International Journal of Performability Engineering, 18(4), 282. https://doi.org/10.23940/ijpe.22.04.p6.282288
IBM. (2019). What is machine learning? https://www.ibm.com/topics/machinelearning?lnk=fle
Mallet, J., Pryor, L., Dave, R., Seliya, N., Vanamala, M., & Sowells-Boone, E. (2022, March). Hold on and swipe: A touch-movement based continuous authentication schema based on machine learning. In 2022 Asia Conference on Algorithms, Computing and Machine Learning (CACML) (pp. 442-447). IEEE. https://doi.org/10.1109/CACML55074.2022.00081
Kanakogi, K., Washizaki, H., Fukazawa, Y., Ogata, S., Okubo, T., Kato, T., ... & Yoshioka, N. (2022). Comparative Evaluation of NLP-Based Approaches for Linking CAPEC Attack Patterns from CVE Vulnerability Information. Applied Sciences, 12(7), 3400. https://doi.org/10.3390/app12073400
Kim, D., & Im, T. (2022). A Systematic Review of Virtual Reality-Based Education Research Using Latent Dirichlet Allocation: Focus on Topic Modeling Technique. Mobile Information Systems, 2022. https://doi.org/10.1155/2022/1201852
Krzeszewska, U., Poniszewska-Marańda, A., & Ochelska-Mierzejewska, J. (2022). Systematic comparison of vectorization methods in classification context. Applied Sciences, 12(10), 5119. https://doi.org/10.3390/app12105119
León-Paredes, G. A., Barbosa-Santillán, L. I., & SánchezEscobar, J. J. (2017). A heterogeneous system based on latent semantic analysis using GPU and multiCPU. Scientific Programming, 2017. https://doi.org/10.1155/2017/8131390
Livingston, F. (2005). Implementation of Breiman’s random forest machine learning algorithm. ECE591Q Machine Learning Journal Paper, 1-13.
Macsai, D. 2012. The most important company you’ve never heard of. 1 Minute Read. Fast Company. https://www.fastcompany.com/3017927/30mitre
McAllister, P., Zheng, H., Bond, R., & Moorhead, A. (2018). Combining deep residual neural network features with supervised machine learning algorithms to classify diverse food image datasets. Computers in Biology and Medicine, 95, 217-233. https://doi.org/10.1016/j.compbiomed.2018.02.008
Mounika, V., Yuan, X., & Bandaru, K. (2019, December). Analyzing CVE database using unsupervised topic modelling. In 2019 International Conference on Computational Science and Computational Intelligence (CSCI) (pp. 72-77). IEEE. https://doi.org/10.1109/CSCI49370.2019.00019
MITRE ATT&CK®. (2022). https://attack.mitre.org
Mohamed, A. E. (2017). Comparative study of four supervised machine learning techniques for classification. International Journal of Applied, 7(2), 1-15. https://www.ijastnet.com/journal/index/859
NIST. (2022). About NIST. https://www.nist.gov/about-nist
Prakash, A., Singh, N. K., & Saha, S. K. (2022). Automatic extraction of similar poetry for study of literary texts: An experiment on Hindi poetry. ETRI Journal, 44(3), 413-425. https://doi.org/10.4218/etrij.2019-0396
Rahman, A. S., Shamrat, F. J. M., Tasnim, Z., Roy, J., & Hossain, S. A. (2019). A comparative study on liver disease prediction using supervised machine learning algorithms. International Journal of Scientific & Technology Research, 8(11), 419-422. http://www.ijstr.org/final-print/nov2019/AComparative-Study-On-Liver-Disease-PredictionUsing-Supervised-Machine-LearningAlgorithms.pdf
Rustam, F., A. Reshi, S. Mehmood, S. Ullah, B. On, W. Aslam and G. Choi. 2020. COVID-19 Future Forecasting Using Supervised Machine Learning Models. IEEE Access, pp: 101489-99. https://doi.org/10.1109/ACCESS.2020.2997311
Sanguri, K., Bhuyan, A., & Patra, S. (2020). A semantic similarity adjusted document co-citation analysis: a case of tourism supply chain. Scientometrics, 125(1), 233-269. https://doi.org/10.1007/s11192-020-03608-0
Schrider, D. R., & Kern, A. D. (2018). Supervised machine learning for population genetics: a new paradigm. Trends in Genetics, 34(4), 301-312. https://doi.org/10.1016/j.tig.2017.12.005
Shalev-Shwartz, S., & Ben-David, S. (2014). Understanding machine learning: From theory to algorithms. Cambridge university press. https://www.cs.huji.ac.il/~shais/UnderstandingMach ineLearning/
Sharma, C., Sharma, S., & Sakshi. (2022). Latent DIRICHLET allocation (LDA) based information modelling on BLOCKCHAIN technology: A review of trends and research patterns used in integration. Multimedia Tools and Applications, 81(25), 36805-36831. https://doi.org/10.1007/s11042-022-13500-z
Siddiqui, N., Dave, R., Vanamala, M., & Seliya, N. (2022). Machine and deep learning applications to mouse dynamics for continuous user authentication. Machine Learning and Knowledge Extraction, 4(2), 502-518. https://doi.org/10.3390/make4020023
Sweeney, E. M., Vogelstein, J. T., Cuzzocreo, J. L., Calabresi, P. A., Reich, D. S., Crainiceanu, C. M., & Shinohara, R. T. (2014). A comparison of supervised machine learning algorithms and feature vectors for MS lesion segmentation using multimodal structural MRI. PloS One, 9(4), e95753. https://doi.org/10.1371/journal.pone.0095753
Uddin, S., Khan, A., Hossain, M. E., & Moni, M. A. (2019). Comparing different supervised machine learning algorithms for disease prediction. BMC Medical Informatics and Decision Making, 19(1), 1-16. https://doi.org/10.1186/s12911-019-1004-8
Ullah, F., Wang, J., Farhan, M., Jabbar, S., Naseer, M. K., & Asif, M. (2020). LSA based smart assessment methodology for SDN infrastructure in IoT environment. International Journal of Parallel Programming, 48, 162-177. https://doi.org/ 10.1007/s10766-018-0570-1
Ullah, F., Jabbar, S., & Mostarda, L. (2021). An intelligent decision support system for software plagiarism detection in academia. International
Journal of Intelligent Systems, 36(6), 2730-2752 https://doi.org/10.1002/int.22399
Vanamala, M., Gilmore, J., Yuan, X., & Roy, K. (2020a, December). Recommending attack patterns for software requirements document. In 2020 International Conference on Computational Science and Computational Intelligence (CSCI) (pp. 1813-1818). IEEE. https://doi.org/10.1109/CSCI51800.2020.00334
Vanamala, M., Yuan, X., & Roy, K. (2020b, August). Topic modeling and classification of Common Vulnerabilities and Exposures database. In 2020 International Conference on Artificial Intelligence, Big Data, Computing and Data Communication Systems (icABCD) (pp. 1-5). IEEE. https://doi.org/10.1109/icABCD49160.2020.9183814
Zhu, L., He, Y., & Zhou, D. (2020). A neural generative model for joint learning topics and topic-specific word embeddings. Transactions of the Association for Computational Linguistics, 8, 471-485. https://doi.org/10.1162/tacl_a_00326