Listen to this story
At Encapsulation.Tech we organize and structure code, enhancing security and promote a software design.
Authors:
(1) Simon R. Davies, School of Computing, Edinburgh Napier University, Edinburgh, UK (s.davies@napier.ac.uk);
(2) Richard Macfarlane, School of Computing, Edinburgh Napier University, Edinburgh, UK;
(3) William J. Buchanan, School of Computing, Edinburgh Napier University, Edinburgh, UK.
Over the last 20 years, a significant number of ransomware detection systems have been proposed in the research literature. The approaches used by these detection systems can be loosely divided into two categories. In one approach, a single method or test is developed which is then used to determine if the system is being attacked by ransomware. The alternative approach is to use machine learning to perform the identification. With the machine learning approach, the system designers identify key features from the running process and system under investigation. The machine learning model then attempts to determine patterns within these features on which to base its judgement. A decision, or classification, is then made, based on the measured values of these features, as to whether the system is under attack or not.
Examples of single-method approaches are [2, 40, 23, 54, 63, 70]. In these cases, the entire effectiveness of the detection technique relies solely on the ability of this single criterion to distinguish between benign and malicious programs [20]. For example, one particular technique used in the identification of ransomware execution is to use the calculated entropy value of the files created by a process. Encrypted files tend to have a high entropy value whereas the entropy value of plain text files is much lower. Encrypted output files generated during the execution of a ransomware program would tend to have higher entropy values, possibly allowing them to be identified as a product of a ransomware infection. Unfortunately, this technique struggles to correctly distinguish between encrypted files and benign files that also have high entropy such as compressed files. The use of entropy as a detection metric has also been called into question [56, 44] as there exist techniques that could be used by ransomware to avoid detection via encoding or, in some other way, manipulating the encrypted output file.
Examples of ransomware detection techniques that have leveraged machine learning are [77, 2, 3, 40, 65, 67] or similarly neural networks [5, 31, 50, 55]. These systems are trained using extracted features from typical ransomware processes or systems that are being attacked by ransomware. Examples of features that are used in these systems are: write entropy, file overwrite behaviour, directory traversal, directory listing, cross-file type access, read/write/ create/close operations, temporary files, file type coverage, file similarity, file type change and access frequency [20]. In most cases, with systems that rely on machine learning to determine if a system is being attacked, the significance of the individual extracted features and their subsequent impact on the final classification is represented internally by the detection system’s model and is not immediately obvious to an observer. Inadequacies with this approach have been investigated in the literature [20] which discusses classifier evasion techniques, known as adversarial machine learning that can be leveraged by ransomware developers to avoid classification and subsequent detection.
However, in a few proposed ransomware detection systems, the designers do try to provide insight into the machine learning techniques used and how the tested features affect the overall decision-making process. The developers of the detection system UNVEIL [36] and its successor Redemption [37], introduce the concept of a malice score which is a combined weighted score derived from the outcome of individual feature tests. The system detects suspicious activity using dynamic analysis and generates a malice score using a heuristic function. Inputs to this function are various behavioural features such as file entropy changes, writes that cover extended portions of a file, file deletion, processes writing to a large number of user files, processes writing to files of different types and back-to-back writes. CryptoLock [68] propose a similar approach summing the results of various tests into a cumulative scoring they refer to as a Reputation Score. This score is derived from measurements of file type changes, the similarity between original and written content and output file entropy values. Another detection system, RWGuard [57], does mention the specific features that are inspected and include file IO, decoy files, file change monitoring and crypto API monitoring. However, very little detail on how the specific calculations are performed is provided. DNA-Droid [26], was the only detection system found that, leveraged a combination of static and dynamic analysis as the inputs to their neural network model. In all cases if this cumulative score is above a certain threshold, then the process is deemed to be malicious, otherwise, the process is considered benign.
However, in all these cases, the individual test results and thresholds are still determined heuristically via the machinelearning model. The model itself decides the significance and weighting given to each extracted feature and the influence that each feature has on the final classification. Reducing the entire decision-making process to effectively a black box function. A consequence of this is that it is difficult for the designers to directly affect the final decision, thus preventing them from being easily able to tune and influence the decision-making process and final classification produced by the model. The resulting quality and accuracy of the decisions made by these systems are essentially reliant on the quality of the training data used to develop the models in the first place.
No ransomware detection systems have been identified in the literature that uses a malice scoring type approach where the constituent scores contributing to the final malice score are determined using analytical or algorithmic calculation methods as opposed to the heuristics used in machine learning approaches.
This paper is available on arxiv under CC BY 4.0 DEED license.