N-gram Opcode Analysis for Android Malware Detection

Date

2016-11

Advisors

Journal Title

Journal ISSN

ISSN

2057-2182

Volume Title

Publisher

Type

Article

Peer reviewed

Abstract

Android malware has been on the rise in recent years due to the increasing popularity of Android and the proliferation of third party application markets. Emerging Android malware families are increasingly adopting sophisticated detection avoidance techniques and this calls for more effective approaches for Android malware detection. Hence, in this paper we present and evaluate an n-gram opcode features based approach that utilizes machine learning to identify and categorize Android malware. This approach enables automated feature discovery without relying on prior expert or domain knowledge for pre-determined features. Furthermore, by using a data segmentation technique for feature selection, our analysis is able to scale up to 10-gram opcodes. Our experiments on a dataset of 2520 samples showed achieved an f-measure of 98% using the n-gram opcode based approach. We also provide empirical findings that illustrate factors that have probable impact on the overall n-gram opcodes performance trends.

Description

The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the URI link.

Keywords

android malware, malware detection, n-gram, machine learning, feature selection, opcode, dalvik bytecode

Citation

Kang, B., Yerima, S. Y., Sezer, S., McLaughlin, K. (2016) N-gram opcode analysis for Android malware detection. International Journal on Cyber Situational Awareness, 1(1), pp. 231-255.

Rights

Research Institute

Cyber Technology Institute (CTI)