Malicious PDF detection Based on Machine Learning with Enhanced Feature Set

Date

2022-12-05

Advisors

Journal Title

Journal ISSN

ISSN

Volume Title

Publisher

IEEE

Type

Conference

Peer reviewed

Yes

Abstract

PDF is one of the most popular document file formats due to its flexibility, platform independence and ability to embed different types of content. Over the years, PDF has become a popular attack vector for spreading malware and compromising computer systems. Existing signature-based defense systems have extremely high recall rates, but quickly become obsolete and ineffective against zero-day attacks, which makes them easy to circumvent by malicious PDF files. Recently, Machine Learning (ML) has emerged as a viable tool to improve discovery of previously unseen attacks. Hence, in this paper we present enhanced ML-based models for the detection of malicious PDF documents. We develop an approach for ML-based detection with static features derived from PDF documents leveraging existing tools and propose new, previously unused features to enhance the performance of the ML-based classifiers. Our investigative study is conducted on the recently published Evasive-PDFMal2022 dataset, which was used to evaluate seven ML classifiers based on our proposed method. The EvasivePDFMal2022 dataset consists of 4,468 benign samples and 5,557 malicious PDF samples. The results of the experiments show that our proposed approach with the enhanced features enabled improved accuracies in five out of seven of the classifiers that were evaluated. The results demonstrate the potential of the new features to increase the robustness of feature-based PDF malware detection.

Description

Keywords

Malicious PDF detection, Static analysis, Feature engineering, Machine learning, Evasive PDF malware dataset

Citation

Yerima, S. Y., Bashar A., and G. Latif (2022). Malicious PDF detection Based on Machine Learning with Enhanced Feature Set. In: Proceedings of the 14th IEEE International Conference on Computational Intelligence and Communication Networks, CICN 2022, Al-Khobar, Saudi Arabia, 4-6 December, 2022.

Rights

Research Institute