Cryptographic Ransomware Encryption Detection: Survey: Detection of Encryption

Authors: (1) Kenan Begovic, currently a Ph.D. candidate in Computer Science at Qatar University. He received his MS in Information and Computer Security from University of Liverpool; (2) Abdulaziz Al-Ali ,received the Ph.D. degree in machine learning from the University of Miami, FL, USA, in 2016 and he is currently an Assistant Professor in the Computer Science and Engineering Department, and director of the KINDI Center for Computing Research at Qatar University; (3) Qutaibah Malluhi, a Professor at the Department of Computer Science and Engineering at Qatar University (QU). Table of Links Abstract and 1 Introduction 2. On crypto-ransomware behavior and methodology 3. Detection of encryption 4. Conclusion Declaration of Competing Interest, CRediT authorship contribution statement, Data availability, and References 3. Detection of encryption Research in detecting ransomware in general through various phases of attacks has snowballed in the past several years. Focused research on cryptographic ransomware follows a general trend of the tremendous increase in published research; however, most surveys remain focused on all attack phases described previously in section 2.1. Encryption is the defining characteristic of a crypto-ransomware attack. The usage of different encryption algorithms, as shown in Fig. 4, choice of symmetric, asymmetric, or combination (hybrid) of different encryption schemes, as shown in Table 1, shows how cryptographic ransomware closely followed the trend in its evolution and how encryption itself continues to be the one differentiating characteristic that is the most obvious candidate to be the factor in the detection of the attack. When researching the phenomenon of cryptographic ransomware through time, we observe that despite the evolution of this threat from malware to something similar to advanced persistent threat (APT), encryption remained a uniquecharacteristic that separates this ransomware from other information security threats. The capability to encrypt without any control gives ransomware attackers the primary motivation and purpose for executing the attack. With that in mind, we surveyed and classified methodologies used to detect ransomware while operating inside the Encryption phase of the attack. The scholarly literature on ransomware detection largely clusters around the following three major groups: API and system call monitoring-based detection I/O monitoring-based detection file system operations monitoring-based detection that include - scanning for high entropy in files and - monitoring deception tokens in file system-based detection. Machine learning (ML), even though sometimes covered as a separate methodology in encryption detection, is cross-cutting the three previously mentioned methodologies depending on information used as features in the dataset that the ML model is trained on. The usage of machine learning in ransomware detection is prevalent in hybrid proposals that employ data from different groups. The same is true for detecting activities in the Encryption phase of the attack. While some of the proposals are included in the survey for their specific use and discussion of machine learning issues related to the detection of ransomware pre-encryption and encryption activities, often in combination with other methods, categorization to each major group that was described previously was done based on the models’ input features. ML ransomware encryption detection includes machine learning techniques and algorithms used to detect encryption and related activities by ransomware attacks in hybrid and pure implementations. The following section presents the state-of-the-art methodologies used to detect ransomware inside the Encryption phase of the attack. 3.1. API and system calls monitoring Monitoring of function and system calls to detect activity related to file encryption aims at detecting encryption at early stages. At this point, API functions and system calls related to encryption operations appear in the dynamic monitoring of events in an operating system or static analysis of binary files. Depending on the operating system, the dynamic detection mechanism focuses on API functions or system calls, and static analysis employs a more comprehensive observation for a set of function calls and text strings. This method is often part of a hybrid solution for ransomware detection. While proposing machine learning solutions for static and dynamic analysis of files suspected of being ransomware, Sheen and Yadav (2018) identified ransomware’s most commonly used API calls. Table 2 presents those API calls in detecting the Encryption phase of ransomware attacks with a brief explanation of their purpose in the Windows operating system. Their solution’s performance was measured on the ML model’s success in differentiating between benign and malicious utilization of all defined features. In contrast, these API functions were also found in other research. Some authors propose a detection model for ransomware using monitoring API calls and developing pre-encryption detection algorithms (Yadav et al., 2021). Even though their paper outlines methods and goals, it still needs to provide concrete solutions. Others use NLP (Natural Language Processing) using convolutional neural networks to analyze API sequences retracted from both ransomware and benign processes in the secure sandbox (Qin et al., 2020). Observing sequences of API calls in machine learning solutions was also applied by Ahmed et al. (2020), in their enhanced Minimum Redundancy Maximum Relevance methods, and the most relevant features fine-tuned from system calls were not necessarily encryption-related. Similarly, Almousa et al. (2021) consolidated Windows operating system API calls found in 51 collected ransomware families with common API calls in software to train machine learning models to detect ransomware. In the case of Mehnaz et al. (2018), with their RWGuard proposal and CryptoAPI Function Hooking (CFHk) Module, they implement a technique of intercepting CryptoAPI calls by hooking function through memory address space, shifting and mandatory JMP instructions intercepts, and securely stored CryptoAPI activities. Kok et al. (2020a) focus their research on extracting API calls in various ransomware samples prior and the call of APIs containing the word ‘crypto’ while the sample was running in a sandbox. Their pre-encryption detection algorithm used a machine learning model trained on the extracted API calls dataset. The limitation of their algorithm is its reliance on Windows API calls which disable them in order to detect ransomware by using custom encryption functions. Another example of looking at pre-encryption API calls is the work of Al-Rimy et al. (2020), who established a two-component detection system consisting of DynamicPre-encryption Boundary Definition (DPBD) and Features Extraction (FE). The former creates the pre-encryption boundary vector with all cryptography-related APIs used to create the boundary of the pre-encryption activities that define the boundary of pre-encryption. The latter extracted all relevant pre-encryption features for use in detection. Arabo et al. (2020) observation of API calls by DLLs in Windows in their hybrid solution proposal pointed out the correlation between process behavior and ransomware activity in the pre-encryption and encryption stages. A similar proposal, focusing on the Android operating system, came from Scalas et al. (2019), who argued that using System API observation in detection systems performs better than complex solutions that combine more complex ransomware indicators. On the static observation of API functions related to encryption, Xu et al. (2017) proposed CryptoHunt deals with obfuscation in binary files that prevents detecting Windows Cryptographic API or OpenSSL functions. While this proposal was resilient to various obfuscation techniques, some crucial elements of the process, detecting custom cryptographic functions, were not described. API monitoring and system calls are also widely used as input features in machine learning-based detection of activities in the Encryption phase. In their proposal, Al-Rimy et al. (2021) argue that feature extraction in the early Encryption phase of a ransomware attack and phases before creating a situation of too few data and high dimensional features leads to a substantial risk of overfitting. As a remedy, they proposed “a novel redundancy coefficient gradual upweighting approach.” The calculation of redundancy terms of mutual information was introduced to improve the feature selection process and enhance the accuracy of the detection model. The experiment showed better accuracy with the proposed approach by testing multiple classifiers in all cases. Similarly, Hwang et al. (2020) propose two-stage detection of crypto-ransomware, first building the Markov model from Windows API call sequence patterns capturing the characteristics of ransomware behavior and then using a Random forest classifier over remaining data (registry keys operations, file system operations, strings, file extensions, directory operations, and dropped file extensions) to control false-positive and false-negative rates. Their two-stage mixed detection model gives 97.28% overall accuracy, 4.83% false-positive rate, and 1.47% false-negative rate. Also, Kok et al. (2020b), as an extension to their already mentioned proposal of the Pre-encryption Detection Algorithm (PEDA), proposed a set of conventional and unconventional metrics for PEDA’s learning algorithm (LA) component performance. By introducing metrics like Likelihood Ratio (LR), Diagnostic odds ratio (DOR), Youden’s index (J), Number needed to diagnose (NND), Number needed to misdiagnose (NNM), and net benefit (NB), they improved the performance in this unique use case when compared to using only conventional metrics. Al-rimy et al. (2019) focused on pre-encryption activities detection in their proposal for a crypto-ransomware detection model. They proposed two combined approaches, the first incremental bagging (iBagging) technique and enhanced semi-random subspace selection (ESRS), which act as an ensemble model. iBagging creates subsets depending on the observation of ransomware behavior, while ESRS then creates subspaces that were used to train a pool of classifiers. The best classifiers were modeled using a grid search, and a voting system was employed. While accuracy was higher than in competing approaches for the same datasets, there were limitations related to feature selection in different subspaces. Since the features were selected within one subspace independently, the same feature could be selected in more than one subspace. This decreases the accuracy of the model. Although this literature is very developed and ever-growing, it still needs a comprehensive focus on researching the detection of Encryption phase-related activities in the case where encryption algorithms are custom-implemented using third-party libraries. The development of static analysis methods, as well as the discovery of patterns in API and system calls for dynamic analysis when custom encryption implementation is used, would significantly improve the overall success of this method. 3.2. I/O (input/output) monitoring Monitoring I/O is another technique that monitors internal behavior in an operating system. It aims to use information from I/O requests related to memory, file system, and even network for the detection of ransomware encryption (and other phases). Like API and system calls monitoring, this technique is often part of a more comprehensive detection solution involving several different methodologies and techniques. Kharaz et al. (2016) used a combination of techniques and methodologies in their proposal for a detection system named UNVEIL. Their monitoring of I/O established that regular applications could generate I/O access requests generated by ransomware encryption tools. However, due to the common design where these regular applications do not block access to the original files, their sequence patterns of I/O operations differ from ransomware. The paper also describes zero-day ransomware detection by observing entropy between read and write operations. McIntosh et al. (2021), as a part of the broader proposal for an access control framework, utilized I/O monitoring using various models of the framework name RANACCO. Their solution nested its modules between Windows I/O and Storage class driver. While the overall framework proposed has limitations, the I/O monitoring part is said to detect encryption successfully. RWGuard by Mehnaz et al. (2018) implements a hybrid solution with IRPParser that logs I/O requests and passes them further to other modules. Network monitoring is also used to detect pre-encryption and encryption events. Almashhadani et al. (2019) proposed networkbased crypto-ransomware detection using Locky ransomware as the case study. Their proposal was built using multiple independent classifiers over both packet and flow data. Using a total of 18 features that were extracted from TCP, HTTP, DNS, and NBNS traffic, the proposal achieved 97.92% accuracy for packet-based and 97.08% accuracy for flow-based data. By using TCP and UDP features computed from network flows, Fernández Maimó et al. (2019), as a continuation of their previous proposal for an integrated clinical environment named ICE++, proposed a machine learning detection and protection system that was capable of anomaly detection and ransomware classification. It also uses Network Function Virtualization (NFV) and Software-Defined Networking (SDN) paradigms to prevent the spread of crypto-ransomware activity. They trained multiple models using multiple algorithms and achieved a precision/recall of 92.32%/99.97% in anomaly detection and an accuracy of 99.99% in ransomware classification. Roy and Chen (2021) proposed a solution named DeepRan that prevents the spreading of the Encryption phase across network-connected computers. DeepRan utilizes attention-based bi-directional Long Short Term Memory (BiLSTM) with a fully connected layer to model the normalcy of networked hosts. Its behavior anomaly detection processes substantial amounts of logging data collected from bare metal servers. Conditional Random Fields (CRF) model was used to extend BiLSTM for detected anomalies to be classified as potential ransomware attacks. Semantic information extraction from “high dimensional host logging data’’ was done by the Term FrequencyInverse Document Frequency (TF-IDF) method. Early ransomware detection had a 99.87% detection accuracy (F1-score of 99.02%). On the hardware monitoring level, Paik et al. (2016) propose monitoring I/O for encryption detection in addition to their SSD hardware monitoring. Similarly, Dimov and Tsonev (2020) monitor HDD performance and utilize the I/O performance rate for disk read and disk write operations to detect ransomware in the Encryption phase. Finally, as a unique type of I/O monitoring, Park and Park (2020) propose hardware tracing for detecting symmetric key cryptographic routine detection in malicious binaries that employ anti-reverse engineering techniques. Azmoodeh et al. (2018) proposed detecting crypto-ransomware activity in IoT by monitoring power consumption and applying machine learning models to the collected data. Their proposal employs Dynamic Time Warping (DTW) as a distance measure with KNN as a classifier, outperforming conventional classifiers like Neural Networks, KNN, and SVM. Their approach achieved a detection rate of 95.65% and a precision rate of 89.19%. Only a few research papers that include I/O monitoring in relation to detecting activities related to the Encryption phase of a ransomware attack are available. The success of some of the mentioned proposals in detecting encryption as activity and overall events that are a consequence of the Encryption phase indicates that much more can be done in this field and that more innovative hybrid solutions are possible. 3.3. File system monitoring Monitoring file system activity to detect the Encryption phase of ransomware attacks focuses on collecting information about the state of the file system and the files themselves. The early ideas, like Young et al. (2012), to use different sector hashes to detect target files, including encrypted files, paving the way for file system usage in the fight against cryptographic ransomware. By using raw binary files as ML features, Khammas (2020) proposed a Random Forest classifier that uses 1000 n-gram features extracted directly from raw bytes using frequent pattern mining. The selection of features was made using the Gain Ratio to reduce the dimensionality of features. The proposal maintains an optimal number of trees to be 100 with achieved accuracy of 97.74%. While this proposal focused on the analysis of binaries that contain ransomware attack tools and recognition of the same among benign binaries, due to the nature of crypto-ransomware behavior, it is safe to assume that many of the n-gram features were related to the Encryption phase. Furthermore, using APK files containing the source code of an Android app, Sharma et al. (2021) extracted features from the file related to ransomware attacks. Their proposal, named RansomDroid for detecting crypto-ransomware activity in Android devices, uses an unsupervised machine learning model. Unlike K-Means clustering, the proposal used a Gaussian Mixture Model with a flexible and probabilistic approach to modeling the dataset. Feature selection and dimensionality reduction for improvement of the model were also utilized. The model detects Android ransomware with an accuracy of 98.08% in 44 ms. Almomani et al. (2021) also use analysis of Android APK files for feature extraction in their proposal. They rely on an “evolutionary based machine learning approach” to detect cryptographic ransomware in Android devices. They used the binary particle swarm optimization algorithm (BPSO) to tune the classifier’s hyperparameters and feature selection. Synthetic minority oversampling technique (SMOTE) with support vector machine (SVM) algorithm was used for classification. The combination name SMOTE-tBPSO-SVM used g-mean as a metric and achieved a result of 97.5%. Tang et al. (2020) proposed a detection and prevention system named RansomSpector that monitors the file system and network activities. It is a virtual machine-based system that resides in the hypervisor, thus making it difficult to bypass through privilege escalation. The crypto-ransomware was detected with extraordinarily little overhead to performance - less than 5%. Continella et al. (2016), in their proposal for ShieldFS, offer a whole new file system, which, combined with the machine learning portion of their proposal, can detect ransomware behavior as an anomaly, including operations related to the Encryption phase. Similarly, Lee et al. (2022) proposed statistical analysis to differentiate between regular and encrypted blocks in the file system. Their solution, Rcryptect, utilizes extracted heuristic rules using FUSE (File system in Userspace) to avoid kernel modification. Rcryptect, among the other methods, detects high entropy files created by cryptographic operations. Nevertheless, the solution faces common issues of false positives for benign files with high entropy and the issue that prevention mechanisms can cause damage to some files under attack before the ransomware encryption process is killed. Entropy, in combination with fuzzy hashing, as a means to detect files encrypted by ransomware in the file system was proposed by Joshi et al. (2021). They used a mini-filter driver that interacts with file system behavior as kernel mode. While achieving more than 95% of detection success in their experiment, the method is susceptible to explorer.exe process DLL injection that would bypass the security measures proposed. Lee et al. (2019) use entropy estimation to detect files encrypted by ransomware in a cloud environment. When using the cloud as a backup, there exists a risk that encrypted files could be synchronized to the cloud. The authors thus observed the number of ransomware encryption attacks and divulged the baseline used in entropy estimation over files in the cloud. Their experiment reported a 100% success rate in detecting encryption. Jung and Won (2018) used the entropy of files in their comprehensive ransomware detection and protection system. They utilized context-aware analysis that used information from APIs, file system metadata, systems to detect largescale read/write operations, and entropy analysis capable of detecting benign usage of encryption with enhanced classification to improve entropy analysis results. Similarly, Jethva et al. (2020) proposed their system to detect and prevent crypto-ransomware using entropy in multilayer detection. The technique was combined with monitoring registry key operations, file signatures in the Windows operating system, and machine learning. By improving the already mentioned method of analyzing the entropy of files, Hsu et al. (2021) examined 22 different file formats of encrypted files and extracted features to be used with the Support Vector Machine algorithm. They achieved a detection rate of 85.17% using the SVM Linear model, which increased to over 92% when using the SVM kernel trick (with the polynomial kernel) model. Not all of the research favors entropy use in detecting ransomware encryption. McIntosh et al. (2019) propose depreciation of this method in the fight against ransomware, arguing that techniques they identified to mitigate entropy usage in encryption detection are sufficient to invalidate reliance on entropy information. In their experiment, BASE-64 encoding and partial file encryption have shown their effectiveness in “confusing” entropy information; thus, the usage of File Integrity and File Type Identification have been proposed as alternatives to using entropy measures. RWGuard is a proposal by Mehnaz et al. (2018) that combines multiple techniques in the real-time detection of cryptographic ransomware. The solution includes monitoring the file system for malicious activity through File Monitoring and File Classification modules. Also, it has the capability to automatically generate decoy files in the file system using a feature called Decoy Files Generator. Some of the techniques used for detection are inherently probabilistic and prone to false positives. Another decoy-in-thefile-system-based solution is the proposal of R-Locker by GómezHernández et al. (2018), a novel approach to creating honeyfiles as decoys to detect and stop ransomware in action. To achieve this in practical implementation on UNIX-like operating systems, a new named pipe is created in the file system containing a specially crafted small-size honeyfile. An alert is raised to kill the process when ransomware attempts to read the file. Due to the fact that the kernel manages synchronization, these regular applications do not block access to the original files between reading and writing; if a process attempts to read more data than expected, reading is blocked until the writer makes up for expected data. This feature allows time to raise alerts about suspected ransomware behavior. In another approach using deception, honeypots for IoT devices were proposed by Sibi Chakkaravarthy et al. (2020) based on Social Leopard Algorithm (SoLA) to model honey folders. The Intrusion Detection Honeypot (IDH) also introduces an Audit Watch module that monitors the entropy of files in the device, together with a module called Complex Event Processing (CEP) that collects information from multiple external security sources used to confit and stop ransomware activity. SoLA algorithm is critical to this proposal with its capabilities to process extracted features from processes that accessed the honey folder. Usage of entropy to detect encryption is present in various operating systems, including Android. A proposal by Jiao et al. (2021) detects custom encryption with an accuracy rate of 98.24% in Android platforms using only entropy information. In a more ML-focused approach, using the activity logs for features extraction that contain all of the filesystem events, Homayoun et al. (2020) proposed applying Sequential Pattern Mining to find Maximal Frequent Patterns (MFP) in logged activities for known ransomware. This created candidate features to be used in classification by multiple machine learning classification algorithms. In their experiment, authors used J48, Random Forest, Bagging, and Multi-Layer Perceptron (MLP) classifiers and achieved an F-measure of 0.994 with a minimum AUC value of 0.99 in the detection of ransomware samples from benign activities using Windows registry, DLL, and file system to registry log of activities. FMeasure of more than 0.98 with a false-positive rate of less than 0.007 in the detection of a given ransomware family using 13 selected features whose significance was recognized during the research. Their results were not short of impressive, creating datasets of ransomware logs for 1624 ransomware binaries sourced from virustotal.com, as well as separate sampling for overfitting. However, no indication was given that testing was performed on an independent dataset. Even though an impressive amount of research has been found in relation to file system monitoring for the purpose of detecting the Encryption phase activities in a ransomware attack, we feel that research went truly little in the direction of tying the techniques mentioned above into access control systems used to control file systems. Entropy detection is an auspicious tool in fighting cryptographic ransomware; more proposals are needed on this topic. 3.4. Commercial solutions brief survey Most of the commercial offering for protection against cryptoransomware focuses on providing capabilities in the area of Enterprise Backup and Recovery. Companies that focus solely on ransomware protection in their products are almost nonexistent. Many vendors focus on delivering recovery capabilities through air-gapped backup and immutable backup copies, and detection is based chiefly on integrity and anomaly behavior monitoring. Even though vendors offer markets as comprehensive data protection solutions, it is indicative that most of them emphasize that the backup is the last line of defense against ransomware, which in some instances indicates that other ransomware protection controls are expected to be in place for the product to live to its expectations. According to Gartner’s Magic Quadrant for Enterprise Backup and Recovery Software Solutions (Rao et al., 2021), the major capability for the evaluation of a product was Ransomware detection and protection. An example is Acronis (Ransomware Protection with Backup for Business - Acronis) which offers both cloudbased and on-premise solutions that include the capability to actively scan for ransomware activity and verify the authenticity and recoverability of backup copies. Another major player in this field is Arcserve (Ransomware Protection Solution for an Impenetrable Business) which does not have its own capabilities to detect and protect against crypto-ransomware built into its product but rather has excellent cooperation with security giant Sophos that provides the capability for them. Cohesity (Ransomware Recovery | Reduce Downtime with Rapid Recovery) is the leader in enterprise backup and recovery. They offer cloud service with immutable backup using the write-once-read-many (WORM) feature and RBAC access control model. Also, they utilize machine learning for anomaly behavior detection to detect crypto-ransomware activity. Commvault Ransomware Recovery - Commvault offers one of the most comprehensive lists of capabilities against ransomware. Their approach employs a zero-trust model, built architecture using NIST’s Cybersecurity Framework (CSF). Their detection capability mostly relies on anomaly detection in both networks and file systems. Dell Technologies (Dell EMC Cyber Recovery Solution – Cyber and Ransomware Data Recovery), another leader in this vertical, provides detection using their Intelligent CyberSense Analytics. It utilizes machine learning for anomaly detection. Other important vendors and leaders in this area, like Veeam (Ransomware Protection: Learn How Veeam Can Protect Your Data) and Rubrik (Ransomware Recovery), use similar techniques, and there are no serious differentiating factors in detecting crypto-ransomware. Other groups of vendors that focus on ransomware detection are traditional threat detection and response companies. They rely on anomaly detection utilizing various monitoring techniques that provide hybrid solutions of API calls and I/O monitoring, and file system changes. Some are using machine learning models, and there are occasional claims of artificial intelligence (AI) that are difficult to confirm. The most significant ones are Carbon Black (Endpoint Protection Platform | VMware Carbon Black Endpoint), Trend Micro (Enterprise Ransomware Protection & Removal), Darktrace (Darktrace for Ransomware), Extrahop (Ransomware Mitigation & Detection Solution - ExtraHop) , and Vectra AI (Ransomware Detection and Response - Ransomware Solutions | Vectra AI). Concerning the relationship between academic research achievements surrounding the detection of ransomware executing the pre-encryption and encryption activities and commercial solutions, it has been noted and observed that an apparent discrepancy exists between the two (Scala et al., 2019; Nicol et al., 2015). The persistence of these differences is not uncommon in cybersecurity-related topics and has been driven by a series of factors categorically branched into categories. Therefore, we have factors that are technical, procedural, or bureaucratic in origin and nature. Amongst those most substantially addressed by the literature are the factors identified as distinctly technical in origin: - Integration Industry solutions utilize different detection methods and measures in a coordinated schema of safeguards, where a comprehensive set of firewalls, intrusion prevention systems, and endpoint protection defenses are adopted to combat real-world threats. Antithetically, the academic approach necessitates the separation of specific detection techniques and their study in isolation, in effect, obfuscating principles of translation. - Scalability Detection solutions developed in an isolated academic environment tend to need help with monitoring and analysis capabilities for the vast amounts of data generated by modern industry networks (Scala et al., 2019). Forasmuch as the complexity and diversity of commercial networks are not to be understated; scaling assumes an integral role in the application of research outcomes in the real-world market (Nicol et al., 2015). - Complexity The algorithms developed through mechanisms of academic inquiry often involve complex and resource-intensive modus operandi, which fail to be practical for real-world deployment or, within the nature of their construction, cannot evolve into more flexible setups (Nicol et al., 2012). - Adaptability With standardized models built on specific ransomware samples and behavior patterns in training academically developed technologies, limitations arise in adaptability for realworld applications (Scala et al., 2019). Ransomware, in real world attacks, rarely follows the uniformity established in these test models. On the contrary, its behavior constantly evolves in a race to outmaneuver newly forged preventative measures as they emerge. Likewise, while procedural factors present similar practical limitations as those found with technical factors, the nominal core of these limitations lies in qualities inherent to the experimental process and research objectives rather than market translation: - Validation The process of testing and validation notably differ across academic and industry borders. Research validation roots itself in the accuracy of simulations and the control of experimental conditions. Commercial solutions require rigorous real-world validation against various extraneous variables and incorporating existing infrastructure to ensure effectiveness, reliability, and compatibility (Grossman et al., 2001; Nicol et al., 2015). - False positives Concerning research goals, the process of addressing false positives should be addressed to achieve high detection rates (Bold et al., 2022; Kok et al., 2020b). Comparatively, greater emphasis is placed on false positives in the industrial context, as the losses incurred through the resultant disruptions are of greater interest. Hence, commercial solutions seek a compromise between adequate detection accuracy and minimizing false positives. - Time to market Rapid response is necessary to combat the emergence rate of new threats and meet the demands of the commercial market. Concurrently, academic research development, testing, and refinement require heavy time investments for the analytic procedure, translating to a natural lag in the pace of ransomware evolution (Kashef et al., 2023; Nicol et al., 2015). In contrast to technical and procedural factors, which are determined by objectively measurable discrepancies, bureaucratic factors are interpretational, based on institutional inefficiencies of anthropogenic origin. The most effective of which present themselves in restrictions of: - Intellectual property and licensing Hesitancy in adopting newly developed technologies based on academic research is often directly tied to the risks involved with an investment in new IPs where precedents have yet to be set on the extent of its protected status. Similar contingencies arise with licensing restrictions that introduce additional expenditures in the form of permissions and approvals. 3.5. Other applications of detection of encryption Detection of encryption is not always related to cryptographic ransomware defense. There are numerous use cases where it would be necessary to recognize if encryption happened or is happening. An example is Ameeno et al. (2019) proposal using the Naive Bayes algorithm to differentiate between compression and encryption and identify file types. Li and Liu (2020) proposed an encryption detection method using deep convolutional neural networks (CNN). The proposal uses converted raw data into two-dimensional matrices as an input to CNN. The results showed a higher detection rate than competitive storage and network encryption methods. In another proposal that utilizes the power of machine learning and deep learning, Yang et al. (2021) propose the usage of natural language processing (NLP) in combination with the two in detecting encrypted network traffic. The technique usually used for weighting in message retrieval and keyword extraction, Term FrequencyInverse Document Frequency (TF-IDF), was used in modeling detection due to its capability not to need analysis of each field in network traffic. Both ensembles of various machine learning classifiers and CNN were used with high accuracy and their own advantages. The advantage of CNN in deep learning is that it efficiently deals with sparse matrices (compressed or uncompressed) generated by TF-IDF in situations where there is an “abundance” of hardware resources. On the other hand, encrypted traffic detection in limited hardware resources is better suited for ensemble classifiers. Finally, even though somehow similar to crypto-ransomware encryption detection, a proposal by Dong et al. (2021) named MBTree deals with the detection of encrypted traffic between Remote Access Trojan (RAT) and Command & Control server (C2). The proposal relies on building the kind of baseline by integrating flow-level DirPiz sequences as a synthesis of host-level Multi-Level Tree (MLTree). Actual detection was done by measuring path similarity and node similarity of actual traffic with the baseline. The F-1 score reported is 94%. This paper is available on arxiv under CC BY 4.0 DEED license. Authors: (1) Kenan Begovic, currently a Ph.D. candidate in Computer Science at Qatar University. He received his MS in Information and Computer Security from University of Liverpool; (2) Abdulaziz Al-Ali ,received the Ph.D. degree in machine learning from the University of Miami, FL, USA, in 2016 and he is currently an Assistant Professor in the Computer Science and Engineering Department, and director of the KINDI Center for Computing Research at Qatar University; (3) Qutaibah Malluhi, a Professor at the Department of Computer Science and Engineering at Qatar University (QU). Authors: Authors: (1) Kenan Begovic, currently a Ph.D. candidate in Computer Science at Qatar University. He received his MS in Information and Computer Security from University of Liverpool; (2) Abdulaziz Al-Ali ,received the Ph.D. degree in machine learning from the University of Miami, FL, USA, in 2016 and he is currently an Assistant Professor in the Computer Science and Engineering Department, and director of the KINDI Center for Computing Research at Qatar University; (3) Qutaibah Malluhi, a Professor at the Department of Computer Science and Engineering at Qatar University (QU). Table of Links Abstract and 1 Introduction 2. On crypto-ransomware behavior and methodology 3. Detection of encryption 4. Conclusion Declaration of Competing Interest, CRediT authorship contribution statement, Data availability, and References Abstract and 1 Introduction Abstract and 1 Introduction 2. On crypto-ransomware behavior and methodology 2. On crypto-ransomware behavior and methodology 3. Detection of encryption 3. Detection of encryption 4. Conclusion 4. Conclusion Declaration of Competing Interest, CRediT authorship contribution statement, Data availability, and References Declaration of Competing Interest, CRediT authorship contribution statement, Data availability, and References 3. Detection of encryption Research in detecting ransomware in general through various phases of attacks has snowballed in the past several years. Focused research on cryptographic ransomware follows a general trend of the tremendous increase in published research; however, most surveys remain focused on all attack phases described previously in section 2.1. Encryption is the defining characteristic of a crypto-ransomware attack. The usage of different encryption algorithms, as shown in Fig. 4, choice of symmetric, asymmetric, or combination (hybrid) of different encryption schemes, as shown in Table 1, shows how cryptographic ransomware closely followed the trend in its evolution and how encryption itself continues to be the one differentiating characteristic that is the most obvious candidate to be the factor in the detection of the attack. When researching the phenomenon of cryptographic ransomware through time, we observe that despite the evolution of this threat from malware to something similar to advanced persistent threat (APT), encryption remained a uniquecharacteristic that separates this ransomware from other information security threats. The capability to encrypt without any control gives ransomware attackers the primary motivation and purpose for executing the attack. With that in mind, we surveyed and classified methodologies used to detect ransomware while operating inside the Encryption phase of the attack. The scholarly literature on ransomware detection largely clusters around the following three major groups: API and system call monitoring-based detection I/O monitoring-based detection file system operations monitoring-based detection that include API and system call monitoring-based detection API and system call monitoring-based detection I/O monitoring-based detection I/O monitoring-based detection file system operations monitoring-based detection that include file system operations monitoring-based detection that include - scanning for high entropy in files and - monitoring deception tokens in file system-based detection. Machine learning (ML), even though sometimes covered as a separate methodology in encryption detection, is cross-cutting the three previously mentioned methodologies depending on information used as features in the dataset that the ML model is trained on. The usage of machine learning in ransomware detection is prevalent in hybrid proposals that employ data from different groups. The same is true for detecting activities in the Encryption phase of the attack. While some of the proposals are included in the survey for their specific use and discussion of machine learning issues related to the detection of ransomware pre-encryption and encryption activities, often in combination with other methods, categorization to each major group that was described previously was done based on the models’ input features. ML ransomware encryption detection includes machine learning techniques and algorithms used to detect encryption and related activities by ransomware attacks in hybrid and pure implementations. The following section presents the state-of-the-art methodologies used to detect ransomware inside the Encryption phase of the attack. 3.1. API and system calls monitoring 3.1. API and system calls monitoring Monitoring of function and system calls to detect activity related to file encryption aims at detecting encryption at early stages. At this point, API functions and system calls related to encryption operations appear in the dynamic monitoring of events in an operating system or static analysis of binary files. Depending on the operating system, the dynamic detection mechanism focuses on API functions or system calls, and static analysis employs a more comprehensive observation for a set of function calls and text strings. This method is often part of a hybrid solution for ransomware detection. While proposing machine learning solutions for static and dynamic analysis of files suspected of being ransomware, Sheen and Yadav (2018) identified ransomware’s most commonly used API calls. Table 2 presents those API calls in detecting the Encryption phase of ransomware attacks with a brief explanation of their purpose in the Windows operating system. Their solution’s performance was measured on the ML model’s success in differentiating between benign and malicious utilization of all defined features. In contrast, these API functions were also found in other research. Some authors propose a detection model for ransomware using monitoring API calls and developing pre-encryption detection algorithms (Yadav et al., 2021). Even though their paper outlines methods and goals, it still needs to provide concrete solutions. Others use NLP (Natural Language Processing) using convolutional neural networks to analyze API sequences retracted from both ransomware and benign processes in the secure sandbox (Qin et al., 2020). Observing sequences of API calls in machine learning solutions was also applied by Ahmed et al. (2020), in their enhanced Minimum Redundancy Maximum Relevance methods, and the most relevant features fine-tuned from system calls were not necessarily encryption-related. Similarly, Almousa et al. (2021) consolidated Windows operating system API calls found in 51 collected ransomware families with common API calls in software to train machine learning models to detect ransomware. In the case of Mehnaz et al. (2018), with their RWGuard proposal and CryptoAPI Function Hooking (CFHk) Module, they implement a technique of intercepting CryptoAPI calls by hooking function through memory address space, shifting and mandatory JMP instructions intercepts, and securely stored CryptoAPI activities. Kok et al. (2020a) focus their research on extracting API calls in various ransomware samples prior and the call of APIs containing the word ‘crypto’ while the sample was running in a sandbox. Their pre-encryption detection algorithm used a machine learning model trained on the extracted API calls dataset. The limitation of their algorithm is its reliance on Windows API calls which disable them in order to detect ransomware by using custom encryption functions. Another example of looking at pre-encryption API calls is the work of Al-Rimy et al. (2020), who established a two-component detection system consisting of DynamicPre-encryption Boundary Definition (DPBD) and Features Extraction (FE). The former creates the pre-encryption boundary vector with all cryptography-related APIs used to create the boundary of the pre-encryption activities that define the boundary of pre-encryption. The latter extracted all relevant pre-encryption features for use in detection. Arabo et al. (2020) observation of API calls by DLLs in Windows in their hybrid solution proposal pointed out the correlation between process behavior and ransomware activity in the pre-encryption and encryption stages. A similar proposal, focusing on the Android operating system, came from Scalas et al. (2019), who argued that using System API observation in detection systems performs better than complex solutions that combine more complex ransomware indicators. On the static observation of API functions related to encryption, Xu et al. (2017) proposed CryptoHunt deals with obfuscation in binary files that prevents detecting Windows Cryptographic API or OpenSSL functions. While this proposal was resilient to various obfuscation techniques, some crucial elements of the process, detecting custom cryptographic functions, were not described. API monitoring and system calls are also widely used as input features in machine learning-based detection of activities in the Encryption phase. In their proposal, Al-Rimy et al. (2021) argue that feature extraction in the early Encryption phase of a ransomware attack and phases before creating a situation of too few data and high dimensional features leads to a substantial risk of overfitting. As a remedy, they proposed “a novel redundancy coefficient gradual upweighting approach.” The calculation of redundancy terms of mutual information was introduced to improve the feature selection process and enhance the accuracy of the detection model. The experiment showed better accuracy with the proposed approach by testing multiple classifiers in all cases. Similarly, Hwang et al. (2020) propose two-stage detection of crypto-ransomware, first building the Markov model from Windows API call sequence patterns capturing the characteristics of ransomware behavior and then using a Random forest classifier over remaining data (registry keys operations, file system operations, strings, file extensions, directory operations, and dropped file extensions) to control false-positive and false-negative rates. Their two-stage mixed detection model gives 97.28% overall accuracy, 4.83% false-positive rate, and 1.47% false-negative rate. Also, Kok et al. (2020b), as an extension to their already mentioned proposal of the Pre-encryption Detection Algorithm (PEDA), proposed a set of conventional and unconventional metrics for PEDA’s learning algorithm (LA) component performance. By introducing metrics like Likelihood Ratio (LR), Diagnostic odds ratio (DOR), Youden’s index (J), Number needed to diagnose (NND), Number needed to misdiagnose (NNM), and net benefit (NB), they improved the performance in this unique use case when compared to using only conventional metrics. Al-rimy et al. (2019) focused on pre-encryption activities detection in their proposal for a crypto-ransomware detection model. They proposed two combined approaches, the first incremental bagging (iBagging) technique and enhanced semi-random subspace selection (ESRS), which act as an ensemble model. iBagging creates subsets depending on the observation of ransomware behavior, while ESRS then creates subspaces that were used to train a pool of classifiers. The best classifiers were modeled using a grid search, and a voting system was employed. While accuracy was higher than in competing approaches for the same datasets, there were limitations related to feature selection in different subspaces. Since the features were selected within one subspace independently, the same feature could be selected in more than one subspace. This decreases the accuracy of the model. Although this literature is very developed and ever-growing, it still needs a comprehensive focus on researching the detection of Encryption phase-related activities in the case where encryption algorithms are custom-implemented using third-party libraries. The development of static analysis methods, as well as the discovery of patterns in API and system calls for dynamic analysis when custom encryption implementation is used, would significantly improve the overall success of this method. 3.2. I/O (input/output) monitoring 3.2. I/O (input/output) monitoring Monitoring I/O is another technique that monitors internal behavior in an operating system. It aims to use information from I/O requests related to memory, file system, and even network for the detection of ransomware encryption (and other phases). Like API and system calls monitoring, this technique is often part of a more comprehensive detection solution involving several different methodologies and techniques. Kharaz et al. (2016) used a combination of techniques and methodologies in their proposal for a detection system named UNVEIL. Their monitoring of I/O established that regular applications could generate I/O access requests generated by ransomware encryption tools. However, due to the common design where these regular applications do not block access to the original files, their sequence patterns of I/O operations differ from ransomware. The paper also describes zero-day ransomware detection by observing entropy between read and write operations. McIntosh et al. (2021), as a part of the broader proposal for an access control framework, utilized I/O monitoring using various models of the framework name RANACCO. Their solution nested its modules between Windows I/O and Storage class driver. While the overall framework proposed has limitations, the I/O monitoring part is said to detect encryption successfully. RWGuard by Mehnaz et al. (2018) implements a hybrid solution with IRPParser that logs I/O requests and passes them further to other modules. Network monitoring is also used to detect pre-encryption and encryption events. Almashhadani et al. (2019) proposed networkbased crypto-ransomware detection using Locky ransomware as the case study. Their proposal was built using multiple independent classifiers over both packet and flow data. Using a total of 18 features that were extracted from TCP, HTTP, DNS, and NBNS traffic, the proposal achieved 97.92% accuracy for packet-based and 97.08% accuracy for flow-based data. By using TCP and UDP features computed from network flows, Fernández Maimó et al. (2019), as a continuation of their previous proposal for an integrated clinical environment named ICE++, proposed a machine learning detection and protection system that was capable of anomaly detection and ransomware classification. It also uses Network Function Virtualization (NFV) and Software-Defined Networking (SDN) paradigms to prevent the spread of crypto-ransomware activity. They trained multiple models using multiple algorithms and achieved a precision/recall of 92.32%/99.97% in anomaly detection and an accuracy of 99.99% in ransomware classification. Roy and Chen (2021) proposed a solution named DeepRan that prevents the spreading of the Encryption phase across network-connected computers. DeepRan utilizes attention-based bi-directional Long Short Term Memory (BiLSTM) with a fully connected layer to model the normalcy of networked hosts. Its behavior anomaly detection processes substantial amounts of logging data collected from bare metal servers. Conditional Random Fields (CRF) model was used to extend BiLSTM for detected anomalies to be classified as potential ransomware attacks. Semantic information extraction from “high dimensional host logging data’’ was done by the Term FrequencyInverse Document Frequency (TF-IDF) method. Early ransomware detection had a 99.87% detection accuracy (F1-score of 99.02%). On the hardware monitoring level, Paik et al. (2016) propose monitoring I/O for encryption detection in addition to their SSD hardware monitoring. Similarly, Dimov and Tsonev (2020) monitor HDD performance and utilize the I/O performance rate for disk read and disk write operations to detect ransomware in the Encryption phase. Finally, as a unique type of I/O monitoring, Park and Park (2020) propose hardware tracing for detecting symmetric key cryptographic routine detection in malicious binaries that employ anti-reverse engineering techniques. Azmoodeh et al. (2018) proposed detecting crypto-ransomware activity in IoT by monitoring power consumption and applying machine learning models to the collected data. Their proposal employs Dynamic Time Warping (DTW) as a distance measure with KNN as a classifier, outperforming conventional classifiers like Neural Networks, KNN, and SVM. Their approach achieved a detection rate of 95.65% and a precision rate of 89.19%. Only a few research papers that include I/O monitoring in relation to detecting activities related to the Encryption phase of a ransomware attack are available. The success of some of the mentioned proposals in detecting encryption as activity and overall events that are a consequence of the Encryption phase indicates that much more can be done in this field and that more innovative hybrid solutions are possible. 3.3. File system monitoring 3.3. File system monitoring Monitoring file system activity to detect the Encryption phase of ransomware attacks focuses on collecting information about the state of the file system and the files themselves. The early ideas, like Young et al. (2012), to use different sector hashes to detect target files, including encrypted files, paving the way for file system usage in the fight against cryptographic ransomware. By using raw binary files as ML features, Khammas (2020) proposed a Random Forest classifier that uses 1000 n-gram features extracted directly from raw bytes using frequent pattern mining. The selection of features was made using the Gain Ratio to reduce the dimensionality of features. The proposal maintains an optimal number of trees to be 100 with achieved accuracy of 97.74%. While this proposal focused on the analysis of binaries that contain ransomware attack tools and recognition of the same among benign binaries, due to the nature of crypto-ransomware behavior, it is safe to assume that many of the n-gram features were related to the Encryption phase. Furthermore, using APK files containing the source code of an Android app, Sharma et al. (2021) extracted features from the file related to ransomware attacks. Their proposal, named RansomDroid for detecting crypto-ransomware activity in Android devices, uses an unsupervised machine learning model. Unlike K-Means clustering, the proposal used a Gaussian Mixture Model with a flexible and probabilistic approach to modeling the dataset. Feature selection and dimensionality reduction for improvement of the model were also utilized. The model detects Android ransomware with an accuracy of 98.08% in 44 ms. Almomani et al. (2021) also use analysis of Android APK files for feature extraction in their proposal. They rely on an “evolutionary based machine learning approach” to detect cryptographic ransomware in Android devices. They used the binary particle swarm optimization algorithm (BPSO) to tune the classifier’s hyperparameters and feature selection. Synthetic minority oversampling technique (SMOTE) with support vector machine (SVM) algorithm was used for classification. The combination name SMOTE-tBPSO-SVM used g-mean as a metric and achieved a result of 97.5%. Tang et al. (2020) proposed a detection and prevention system named RansomSpector that monitors the file system and network activities. It is a virtual machine-based system that resides in the hypervisor, thus making it difficult to bypass through privilege escalation. The crypto-ransomware was detected with extraordinarily little overhead to performance - less than 5%. Continella et al. (2016), in their proposal for ShieldFS, offer a whole new file system, which, combined with the machine learning portion of their proposal, can detect ransomware behavior as an anomaly, including operations related to the Encryption phase. Similarly, Lee et al. (2022) proposed statistical analysis to differentiate between regular and encrypted blocks in the file system. Their solution, Rcryptect, utilizes extracted heuristic rules using FUSE (File system in Userspace) to avoid kernel modification. Rcryptect, among the other methods, detects high entropy files created by cryptographic operations. Nevertheless, the solution faces common issues of false positives for benign files with high entropy and the issue that prevention mechanisms can cause damage to some files under attack before the ransomware encryption process is killed. Entropy, in combination with fuzzy hashing, as a means to detect files encrypted by ransomware in the file system was proposed by Joshi et al. (2021). They used a mini-filter driver that interacts with file system behavior as kernel mode. While achieving more than 95% of detection success in their experiment, the method is susceptible to explorer.exe process DLL injection that would bypass the security measures proposed. Lee et al. (2019) use entropy estimation to detect files encrypted by ransomware in a cloud environment. When using the cloud as a backup, there exists a risk that encrypted files could be synchronized to the cloud. The authors thus observed the number of ransomware encryption attacks and divulged the baseline used in entropy estimation over files in the cloud. Their experiment reported a 100% success rate in detecting encryption. Jung and Won (2018) used the entropy of files in their comprehensive ransomware detection and protection system. They utilized context-aware analysis that used information from APIs, file system metadata, systems to detect largescale read/write operations, and entropy analysis capable of detecting benign usage of encryption with enhanced classification to improve entropy analysis results. Similarly, Jethva et al. (2020) proposed their system to detect and prevent crypto-ransomware using entropy in multilayer detection. The technique was combined with monitoring registry key operations, file signatures in the Windows operating system, and machine learning. By improving the already mentioned method of analyzing the entropy of files, Hsu et al. (2021) examined 22 different file formats of encrypted files and extracted features to be used with the Support Vector Machine algorithm. They achieved a detection rate of 85.17% using the SVM Linear model, which increased to over 92% when using the SVM kernel trick (with the polynomial kernel) model. Not all of the research favors entropy use in detecting ransomware encryption. McIntosh et al. (2019) propose depreciation of this method in the fight against ransomware, arguing that techniques they identified to mitigate entropy usage in encryption detection are sufficient to invalidate reliance on entropy information. In their experiment, BASE-64 encoding and partial file encryption have shown their effectiveness in “confusing” entropy information; thus, the usage of File Integrity and File Type Identification have been proposed as alternatives to using entropy measures. RWGuard is a proposal by Mehnaz et al. (2018) that combines multiple techniques in the real-time detection of cryptographic ransomware. The solution includes monitoring the file system for malicious activity through File Monitoring and File Classification modules. Also, it has the capability to automatically generate decoy files in the file system using a feature called Decoy Files Generator. Some of the techniques used for detection are inherently probabilistic and prone to false positives. Another decoy-in-thefile-system-based solution is the proposal of R-Locker by GómezHernández et al. (2018), a novel approach to creating honeyfiles as decoys to detect and stop ransomware in action. To achieve this in practical implementation on UNIX-like operating systems, a new named pipe is created in the file system containing a specially crafted small-size honeyfile. An alert is raised to kill the process when ransomware attempts to read the file. Due to the fact that the kernel manages synchronization, these regular applications do not block access to the original files between reading and writing; if a process attempts to read more data than expected, reading is blocked until the writer makes up for expected data. This feature allows time to raise alerts about suspected ransomware behavior. In another approach using deception, honeypots for IoT devices were proposed by Sibi Chakkaravarthy et al. (2020) based on Social Leopard Algorithm (SoLA) to model honey folders. The Intrusion Detection Honeypot (IDH) also introduces an Audit Watch module that monitors the entropy of files in the device, together with a module called Complex Event Processing (CEP) that collects information from multiple external security sources used to confit and stop ransomware activity. SoLA algorithm is critical to this proposal with its capabilities to process extracted features from processes that accessed the honey folder. Usage of entropy to detect encryption is present in various operating systems, including Android. A proposal by Jiao et al. (2021) detects custom encryption with an accuracy rate of 98.24% in Android platforms using only entropy information. In a more ML-focused approach, using the activity logs for features extraction that contain all of the filesystem events, Homayoun et al. (2020) proposed applying Sequential Pattern Mining to find Maximal Frequent Patterns (MFP) in logged activities for known ransomware. This created candidate features to be used in classification by multiple machine learning classification algorithms. In their experiment, authors used J48, Random Forest, Bagging, and Multi-Layer Perceptron (MLP) classifiers and achieved an F-measure of 0.994 with a minimum AUC value of 0.99 in the detection of ransomware samples from benign activities using Windows registry, DLL, and file system to registry log of activities. FMeasure of more than 0.98 with a false-positive rate of less than 0.007 in the detection of a given ransomware family using 13 selected features whose significance was recognized during the research. Their results were not short of impressive, creating datasets of ransomware logs for 1624 ransomware binaries sourced from virustotal.com, as well as separate sampling for overfitting. However, no indication was given that testing was performed on an independent dataset. Even though an impressive amount of research has been found in relation to file system monitoring for the purpose of detecting the Encryption phase activities in a ransomware attack, we feel that research went truly little in the direction of tying the techniques mentioned above into access control systems used to control file systems. Entropy detection is an auspicious tool in fighting cryptographic ransomware; more proposals are needed on this topic. 3.4. Commercial solutions brief survey 3.4. Commercial solutions brief survey Most of the commercial offering for protection against cryptoransomware focuses on providing capabilities in the area of Enterprise Backup and Recovery. Companies that focus solely on ransomware protection in their products are almost nonexistent. Many vendors focus on delivering recovery capabilities through air-gapped backup and immutable backup copies, and detection is based chiefly on integrity and anomaly behavior monitoring. Even though vendors offer markets as comprehensive data protection solutions, it is indicative that most of them emphasize that the backup is the last line of defense against ransomware, which in some instances indicates that other ransomware protection controls are expected to be in place for the product to live to its expectations. According to Gartner’s Magic Quadrant for Enterprise Backup and Recovery Software Solutions (Rao et al., 2021), the major capability for the evaluation of a product was Ransomware detection and protection. An example is Acronis (Ransomware Protection with Backup for Business - Acronis) which offers both cloudbased and on-premise solutions that include the capability to actively scan for ransomware activity and verify the authenticity and recoverability of backup copies. Another major player in this field is Arcserve (Ransomware Protection Solution for an Impenetrable Business) which does not have its own capabilities to detect and protect against crypto-ransomware built into its product but rather has excellent cooperation with security giant Sophos that provides the capability for them. Cohesity (Ransomware Recovery | Reduce Downtime with Rapid Recovery) is the leader in enterprise backup and recovery. They offer cloud service with immutable backup using the write-once-read-many (WORM) feature and RBAC access control model. Also, they utilize machine learning for anomaly behavior detection to detect crypto-ransomware activity. Commvault Ransomware Recovery - Commvault offers one of the most comprehensive lists of capabilities against ransomware. Their approach employs a zero-trust model, built architecture using NIST’s Cybersecurity Framework (CSF). Their detection capability mostly relies on anomaly detection in both networks and file systems. Dell Technologies (Dell EMC Cyber Recovery Solution – Cyber and Ransomware Data Recovery), another leader in this vertical, provides detection using their Intelligent CyberSense Analytics. It utilizes machine learning for anomaly detection. Other important vendors and leaders in this area, like Veeam (Ransomware Protection: Learn How Veeam Can Protect Your Data) and Rubrik (Ransomware Recovery), use similar techniques, and there are no serious differentiating factors in detecting crypto-ransomware. Other groups of vendors that focus on ransomware detection are traditional threat detection and response companies. They rely on anomaly detection utilizing various monitoring techniques that provide hybrid solutions of API calls and I/O monitoring, and file system changes. Some are using machine learning models, and there are occasional claims of artificial intelligence (AI) that are difficult to confirm. The most significant ones are Carbon Black (Endpoint Protection Platform | VMware Carbon Black Endpoint), Trend Micro (Enterprise Ransomware Protection & Removal), Darktrace (Darktrace for Ransomware), Extrahop (Ransomware Mitigation & Detection Solution - ExtraHop) , and Vectra AI (Ransomware Detection and Response - Ransomware Solutions | Vectra AI). Concerning the relationship between academic research achievements surrounding the detection of ransomware executing the pre-encryption and encryption activities and commercial solutions, it has been noted and observed that an apparent discrepancy exists between the two (Scala et al., 2019; Nicol et al., 2015). The persistence of these differences is not uncommon in cybersecurity-related topics and has been driven by a series of factors categorically branched into categories. Therefore, we have factors that are technical, procedural, or bureaucratic in origin and nature. Amongst those most substantially addressed by the literature are the factors identified as distinctly technical in origin: - Integration Industry solutions utilize different detection methods and measures in a coordinated schema of safeguards, where a comprehensive set of firewalls, intrusion prevention systems, and endpoint protection defenses are adopted to combat real-world threats. Antithetically, the academic approach necessitates the separation of specific detection techniques and their study in isolation, in effect, obfuscating principles of translation. - Scalability Detection solutions developed in an isolated academic environment tend to need help with monitoring and analysis capabilities for the vast amounts of data generated by modern industry networks (Scala et al., 2019). Forasmuch as the complexity and diversity of commercial networks are not to be understated; scaling assumes an integral role in the application of research outcomes in the real-world market (Nicol et al., 2015). - Complexity The algorithms developed through mechanisms of academic inquiry often involve complex and resource-intensive modus operandi, which fail to be practical for real-world deployment or, within the nature of their construction, cannot evolve into more flexible setups (Nicol et al., 2012). - Adaptability With standardized models built on specific ransomware samples and behavior patterns in training academically developed technologies, limitations arise in adaptability for realworld applications (Scala et al., 2019). Ransomware, in real world attacks, rarely follows the uniformity established in these test models. On the contrary, its behavior constantly evolves in a race to outmaneuver newly forged preventative measures as they emerge. Likewise, while procedural factors present similar practical limitations as those found with technical factors, the nominal core of these limitations lies in qualities inherent to the experimental process and research objectives rather than market translation: - Validation The process of testing and validation notably differ across academic and industry borders. Research validation roots itself in the accuracy of simulations and the control of experimental conditions. Commercial solutions require rigorous real-world validation against various extraneous variables and incorporating existing infrastructure to ensure effectiveness, reliability, and compatibility (Grossman et al., 2001; Nicol et al., 2015). - False positives Concerning research goals, the process of addressing false positives should be addressed to achieve high detection rates (Bold et al., 2022; Kok et al., 2020b). Comparatively, greater emphasis is placed on false positives in the industrial context, as the losses incurred through the resultant disruptions are of greater interest. Hence, commercial solutions seek a compromise between adequate detection accuracy and minimizing false positives. - Time to market Rapid response is necessary to combat the emergence rate of new threats and meet the demands of the commercial market. Concurrently, academic research development, testing, and refinement require heavy time investments for the analytic procedure, translating to a natural lag in the pace of ransomware evolution (Kashef et al., 2023; Nicol et al., 2015). In contrast to technical and procedural factors, which are determined by objectively measurable discrepancies, bureaucratic factors are interpretational, based on institutional inefficiencies of anthropogenic origin. The most effective of which present themselves in restrictions of: - Intellectual property and licensing Hesitancy in adopting newly developed technologies based on academic research is often directly tied to the risks involved with an investment in new IPs where precedents have yet to be set on the extent of its protected status. Similar contingencies arise with licensing restrictions that introduce additional expenditures in the form of permissions and approvals. 3.5. Other applications of detection of encryption 3.5. Other applications of detection of encryption Detection of encryption is not always related to cryptographic ransomware defense. There are numerous use cases where it would be necessary to recognize if encryption happened or is happening. An example is Ameeno et al. (2019) proposal using the Naive Bayes algorithm to differentiate between compression and encryption and identify file types. Li and Liu (2020) proposed an encryption detection method using deep convolutional neural networks (CNN). The proposal uses converted raw data into two-dimensional matrices as an input to CNN. The results showed a higher detection rate than competitive storage and network encryption methods. In another proposal that utilizes the power of machine learning and deep learning, Yang et al. (2021) propose the usage of natural language processing (NLP) in combination with the two in detecting encrypted network traffic. The technique usually used for weighting in message retrieval and keyword extraction, Term FrequencyInverse Document Frequency (TF-IDF), was used in modeling detection due to its capability not to need analysis of each field in network traffic. Both ensembles of various machine learning classifiers and CNN were used with high accuracy and their own advantages. The advantage of CNN in deep learning is that it efficiently deals with sparse matrices (compressed or uncompressed) generated by TF-IDF in situations where there is an “abundance” of hardware resources. On the other hand, encrypted traffic detection in limited hardware resources is better suited for ensemble classifiers. Finally, even though somehow similar to crypto-ransomware encryption detection, a proposal by Dong et al. (2021) named MBTree deals with the detection of encrypted traffic between Remote Access Trojan (RAT) and Command & Control server (C2). The proposal relies on building the kind of baseline by integrating flow-level DirPiz sequences as a synthesis of host-level Multi-Level Tree (MLTree). Actual detection was done by measuring path similarity and node similarity of actual traffic with the baseline. The F-1 score reported is 94%. This paper is available on arxiv under CC BY 4.0 DEED license. This paper is available on arxiv under CC BY 4.0 DEED license. available on arxiv