Hybridizing machine learning and static malware detection using the PE header
Abstract
Cyber crime cases currently involve demanding payment after infecting a victimized organization’s computers with ransomware or impairing operations through a distributed denial-of-service attack which significantly impacts the confidentiality, integrity
and availability of data. Recent researchers show that hybridizing techniques can detect malware or benign effectively. Our research provides an experimental study on hybridizing machine learning and signature-based techniques to detect malware
based on the PE header information. The dataset was sliced randomly into training 80% and testing 20% sets. The classifiers
we used were Random Forest, Gradient Boosting and Ada boost to train and test the dataset. We evaluated our models using the evaluation metrics. Results showed overall achieved accuracy is high for the cleaned dataset ranging from 99.70%
to 99.77%, for the uncleaned dataset range from 93.83% to 96.83%. The VirusTotal file report API had a high Average detection rate for unclean datasets ranging from 0.00% to 12.57% and a low average detection rate of 0.00% on a cleaned dataset. Random Forest emerged as the best classifier for both cleaned and uncleaned datasets with an average detection rate for static analysis of 0.00%.