Authors:
(1) Simon R. Davies, School of Computing, Edinburgh Napier University, Edinburgh, UK ([email protected]);
(2) Richard Macfarlane, School of Computing, Edinburgh Napier University, Edinburgh, UK;
(3) William J. Buchanan, School of Computing, Edinburgh Napier University, Edinburgh, UK.
Crypto-ransomware remains a significant threat to governments and companies alike, with high-profile cyber security incidents regularly making headlines. Many different detection systems have been proposed as solutions to the ever-changing dynamic landscape of ransomware detection. In the majority of cases, these described systems propose a method based on the result of a single test performed on either the executable code, the process under investigation, its behaviour, or its output. In a small subset of ransomware detection systems, the concept of a scorecard is employed where multiple tests are performed on various aspects of a process under investigation and their results are then analysed using machine learning.
The purpose of this paper is to propose a new majority voting approach to ransomware detection by developing a method that uses a cumulative score derived from discrete tests based on calculations using algorithmic rather than heuristic techniques. The paper describes 23 candidate tests, as well as 9 Windows API tests which are validated to determine both their accuracy and viability for use within a ransomware detection system.
Using a cumulative score calculation approach to ransomware detection has several benefits, such as the immunity to the occasional inaccuracy of individual tests when making its final classification. The system can also leverage multiple tests that can be both comprehensive and complimentary in an attempt to achieve a broader, deeper, and more robust analysis of the program under investigation. Additionally, the use of multiple collaborative tests also significantly hinders ransomware from masking or modifying its behaviour in an attempt to bypass detection.
The results achieved by this research demonstrate that many of the proposed tests achieved a high degree of accuracy in differentiating between benign and malicious targets and suggestions are offered as to how these tests, and combinations of tests, could be adapted to further improve the detection accuracy.
Crypto-ransomware infections remain a significant threat to governments and companies alike with high-profile cyber security incidents regularly making headlines [53, 72]. The detection of ransomware is often described as an arms race [61] between threat actors and the people responsible for developing effective malware countermeasures and techniques.
There are two main approaches used in malware analysis in general and ransomware analysis in particular. Static Analysis [77], where the evaluation of the program is performed without the actual execution of the code. Essentially the program contents are examined in an attempt to determine the nature of the program and its possible application. This is normally achieved by attempting to isolate and identify known patterns or signatures within the code. Static analysis scales well and can provide better coverage of a ransomware binary code. However, static analysis can produce false execution behaviour as code paths may not be reachable during actual execution [21] and tell-tale signatures may not be known at the time of analysis. Dynamic Analysis, on the other hand, executes the program under investigation in an instrumented or monitored manner and garners more factual information on the behaviour and effect of the program. [email protected] (S.R. Davies) ORCID(s): 0000-0001-9377-4539 (S.R. Davies); 0000-0002-5325-2872 (R. Macfarlane); 0000-0003-0809-3523 (W.J. Buchanan) namic analysis can provide more accurate information on the actual execution behaviour of the investigated binary, though dynamic analysis can be computationally expensive [43] and contains some element of risk.
The problem of automatic malware detection is a difficult one, with no full solution in sight despite decades of research [19]. The traditional approach, based on analysis of static signatures - is increasingly rendered ineffective by polymorphism and the widespread availability of program obfuscation tools [58, 60]. Using such tools, malware creators can quickly generate thousands of binary variants of functionally identical samples, effectively circumventing signature-based approaches. As a result, in recent years, the focus of the research community has increasingly shifted toward dynamic, behaviour-based analysis techniques. Behavioural approaches sidestep the challenges of obfuscated binary analysis. Instead, they focus on the run-time behaviour of malware processes, which is difficult to alter without breaking core functionality and is therefore considered a reliable fingerprint for malware presence [19].
Over the years many different detection systems have been proposed as solutions to the ever-changing dynamic landscape of ransomware detection. These approaches have leveraged many different techniques such as machine learning [77, 2, 3, 40, 65, 67], neural networks [5, 31, 50, 55], file entropy [67, 29, 45, 46, 73], kernel hooking and process behaviour [4, 9, 39, 71]. In the majority of cases, the described systems propose a method based on the result of a single test performed on either the executable code, the process under investigation, its behaviour or its output. Many of the proposed systems claim to archive relatively high accuracy. Unfortunately, the researchers rarely publish enough detail of their research or the datasets used to allow the reported results to be replicated. Berrueta [8] identifies that there are no common metrics of accuracy and performance in ransomware detection. The fragmentation of scientific research on ransomware combined with a lack of coherent investigation methodology is a major challenge in this research [14]. This view is supported by Maigida [52] who state that the lack of readily available data is also hindering the speedy development of detection and prevention solutions.
In a small subset of ransomware detection systems, the concept of a scorecard is employed. In these specific detection systems, multiple tests are performed on various aspects of a process under investigation. The results of each test contribute to an overall score for the process. A decision can then be made, based on this score, as to whether the process under investigation is benign or malicious. The main proponent of this approach was Kharraz [37] in their implementation of the Redemption detection system. In this work, they refer to this cumulative score as a Malice Score, and for the remainder of this paper, we will use their terminology when discussing this combined ranking score. Other detection systems that have also used this concept of a cumulative malice score are [1, 13, 32, 36, 57].
None of the described systems used an analytical or algorithmic approach to calculating values that could then be combined into a cumulative malice score, rather they relied on some form of machine learning to determine the result. This paper describes the work performed by the authors in building on the original research conducted by Kharraz [37], enhancing and updating their approach and proposing many new discretely calculated static and dynamic analysis tests that could be incorporated into the final malice score calculation.
A majority voting approach was chosen for the ransomware detection system proposed in this work. With this type of system, each of the underlying contributing tests generates a binary output. The result of an individual test can be either that it is considered malicious or it can be considered benign. These individual contributing scores are calculated using algorithmic rather than the heuristic techniques previously proposed in earlier research. Once all the tests have been performed, the resulting votes are then collated into two sets, malicious votes and benign votes. The final classification decision of the detection system is then determined from the set that received the majority of votes. An advantage of this approach is that the system requires no training, as the constituent values are calculated using discrete reproducible tests that require no prior knowledge or model training. These proposed new additional tests are validated using a modern and diverse dataset [18] to determine both their accuracy and viability for use within a ransomware detection system. In the initial design, each test has an equal weighting and thus an equal contribution to the final result. However, this design may be adapted in later iterations by the inclusion of weighting and bias to the results of individual tests, allowing their votes to have more effect on the final decision.
There are many benefits associated with using a cumulative score calculation approach to ransomware detection. For example, when using such an approach, the detection system does not rely on a single specific attribute to base its decision on whether the program under investigation is malicious or not. Rather it can leverage multiple tests that can be both comprehensive and complimentary in an attempt to achieve a broader, deeper and more robust analysis of the program under investigation. Also, such a system would be easier to enhance, as adding additional tests based on new research would be straightforward. Bias from one particular test [77, 19] would also be mitigated, and the weighting of each contributing test could be adjusted to improve accuracy. Additionally, the use of multiple collaborative tests also significantly hinders ransomware from masking or modifying its behaviour in its attempt to bypass detection [19, 56].
The remainder of the paper is structured as follows. In Section 2, we discuss some of the main techniques used in ransomware detection and discuss in detail other techniques that use a collaborative voting approach or a combined scoring technique. In Section 3, we provide a description of the candidate tests that could potentially be included in the cumulative malice scoring calculation and outline the methodology used in the experiments. In Section 4, we present the recorded results and discuss the consequences of the findings with regard to the development of anti-ransomware techniques, and we provide some recommendations for crypto-ransomware detection approaches moving forward. Finally, in Section 5, we discuss the main findings and conclusions gained from this research together with possible limitations in using this approach and suggest further research that could be conducted based on the findings from the research presented in this paper.
The main contributions of this paper are:
• Design, development and detailed description of 23 potential ransomware detection tests.
• Investigation into the amount and frequency of Windows API calls within the ransomware executable files and volatile memory of a ransomware process.
• Validation of the effectiveness of the proposed tests in detecting ransomware.
• A ransomware detection system based on algorithmic derived ransomware indicators.
• The use of a modern publicly available dataset during the development and testing of the system. The majority of the similar systems proposed in the literature use datasets that are up to 14 years old.
This paper is available on arxiv under CC BY 4.0 DEED license.