Resumo
A major threat to system’s security is malware infections, which cause financial and image losses to corporate and endusers, thus motivating the development of malware detectors. In this scenario, Machine Learning (ML) has been demonstrated to be a powerful technique to develop classifiers able to distinguish malware from goodware samples. However, many ML research work on malware detection focus only on the final detection accuracy rate and overlook other important aspects of classifier’s implementation and evaluation, such as feature extraction and parameter selection. In this project, we shed light to these aspects to highlight the challenges and drawbacks of ML-based malware classifiers development. We discovered that (i) dynamic features outperforms static features; (ii) Discrete-bounded features present smaller accuracy variance; (iii) Datasets presenting distinct characteristics impose generalization challenges to ML models; and (iv) Feature analysis can be used as feedback information for malware detection and infection prevention.
Referências

Este trabalho está licenciado sob uma licença Creative Commons Attribution 4.0 International License.
Copyright (c) 2019 Revista dos Trabalhos de Iniciação Científica da UNICAMP
