The Power of MEME: Adversarial Malware Creation with Model-Based Reinforcement Learningby@memeology

The Power of MEME: Adversarial Malware Creation with Model-Based Reinforcement Learning

tldt arrow

Too Long; Didn't Read

The MEME algorithm combines malware evasion and model extraction using reinforcement learning, achieving high evasion rates and creating accurate surrogate models with minimal interaction with target models, revolutionizing adversarial malware creation and cybersecurity defenses.
featured image - The Power of MEME: Adversarial Malware Creation with Model-Based Reinforcement Learning
Memeology: Leading Authority on the Study of Memes HackerNoon profile picture


(1) Maria Rigaki, Faculty of Electrical Engineering, Czech Technical University in Prague, Czech Republic and [email protected];

(2) Sebastian Garcia, Faculty of Electrical Engineering, Czech Technical University in Prague, Czech Republic and [email protected].

Abstract & Introduction

Threat Model

Background and Related Work


Experiments Setup



Conclusion, Acknowledgments, and References



Due to the proliferation of malware, defenders are increasingly turning to automation and machine learning as part of the malware detection toolchain. However, machine learning models are susceptible to adversarial attacks, requiring the testing of model and product robustness. Meanwhile, attackers also seek to automate malware generation and evasion of antivirus systems, and defenders try to gain insight into their methods. This work proposes a new algorithm that combines Malware Evasion and Model Extraction (MEME) attacks. MEME uses model-based reinforcement learning to adversarially modify Windows executable binary samples while simultaneously training a surrogate model with a high agreement with the target model to evade. To evaluate this method, we compare it with two state-of-the-art attacks in adversarial malware creation, using three well-known published models and one antivirus product as targets. Results show that MEME outperforms the state-of-the-art methods in terms of evasion capabilities in almost all cases, producing evasive malware with an evasion rate in the range of 32-73%. It also produces surrogate models with a prediction label agreement with the respective target models between 97-99%. The surrogate could be used to fine-tune and improve the evasion rate in the future.

Keywords: adversarial malware · reinforcement learning · model extraction · model stealing

1 Introduction

As machine learning models are more commonly used in malware detection, there is a growing need for detection tools to combat evasive malware. Understanding attackers’ motives is vital in defending against malware, often created for profit. Malware-as-a-Service operations are used to automate the obfuscation and evasiveness of existing malware, and it is safe to assume that attackers will continue to improve their automation and try to create adversarial malware as efficiently as possible [45]. In this work, we are interested in the problem of automating the generation of evasive Windows malware executables primarily against machine learning static detection models.

Creating evasive malicious binaries that preserve their functionality has been the subject of several works until now. Most works use a set of pre-defined actions that alter the Windows binary file by, e.g., adding benign sections and strings, modifying section names, and other ”non-destructive” alterations. The selection of the most appropriate set of actions is learned through reinforcement learning or similar approaches. To our knowledge, all prior work relies on the assumption that the target model (or system) is available to perform an unlimited number of checks or queries to verify whether a malicious binary has evaded the target. However, from the attacker’s perspective, fewer queries can a) generally mean less time to produce malware, b) lead to lower detection probabilities, and c) leak less information about the adversarial techniques. Therefore, assuming an attacker can do unlimited queries to a target model to modify their malware may not be realistic in evasive malware creation.

We propose an algorithm that combines malware evasion with model extraction (MEME) based on model-based reinforcement learning. The goal of MEME is to learn a reinforcement learning policy that selects the appropriate modifications to a malicious Windows binary file in order for it to evade a target detection model while using a limited amount of interactions with the target. The core idea is to use observations and labels collected during the interaction with the reinforcement learning environment and use them with an auxiliary dataset to train a surrogate model of the target. Then the policy is trained to learn to evade the surrogate and evaluated on the original target.

We test MEME using three malware detection models that are released and publicly available and on an antivirus installed by default in all Windows operating systems. MEME is compared with two baseline methods (a random policy and a PPO-based policy [35]) and well as with two state-of-the art methods (MAB [38] and GAMMA [9]). Using only 2,048 queries to the target during the training phase, MEME learns a policy that evades the targets with an evasion rate of 32-73%. MEME outperforms all baselines and state-of-the-art methods in all but one target. The algorithm also learns a surrogate model for each target with 97-99% label agreement with a much lower query budget than previously reported.

The main contributions of this work are:

– A novel combination of two attacks, malware evasion and model extraction in one algorithm (MEME).

– An efficient generation of adversarial malware using model-based reinforcement learning while maintaining better evasion rates than state-of-the-art methods in most targets.

– An efficient surrogate creation method that uses the adversarial samples produced during the training and evaluation of the reinforcement learning agent. The surrogates achieve high label agreement with the targets using minimal interaction with the target models.

This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license.