No Thumbnail Available
Experimental Toolkit for Studying Executable Packing - Analysis of the State-of-the-Art Packing Detection Techniques
Files
DHONDT_05251301_2022.pdf
Closed access - Adobe PDF
- 7.22 MB
DHONDT_05251301_2022_PRINT.pdf
Closed access - Adobe PDF
- 7.23 MB
Details
- Supervisors
- Faculty
- Degree label
- Abstract
- Be it for a malicious or legitimate purpose, packing, a transformation that consists in applying various operations like compression or encryption to a binary file (an executable or object), mostly for making reverse engineering harder or obfuscating code, is widely employed since decades already. Particularly in the field of malware analysis were a stumbling block is antivirus evasion, it has proven effective and still gives a hard time to scientists who want to efficiently detect it. While already extensively covered in the scientific literature, it remains an open issue especially when taking its detection time and accuracy trade-off into account. Many methods and approaches have been proposed, combining static and dynamic techniques and features, and increasingly rely on machine learning. However, we can notice through a deep literature review that most of the studies, while providing high-quality results, often restrict their scope to malware analysis and Windows executables and do no provide any open implementation, which makes the work of comparing current state-of-the-art solutions almost impossible. Moreover, some studies rely on possibly inaccurate ground truths while using combinations of custom detectors found in the wild (typically on open source sharing platforms) for labeling their samples. Considering the many challenges that packing implies, there exists room for improvement in the way executable packing is currently addressed, especially when dealing with static detection techniques. With regard to currently available detection tools, their performance and underlying logic could be assessed in depth. Datasets could be more thoroughly prepared and should be guaranteed unbiased before training models (even though some machine learning algorithms are resilient to outliers). Even the application of machine learning, despite the excellent libraries currently available, can be a cumbersome and hardly repeatable work requiring many efforts, in particular with feature engineering and hyper-parameter tuning. Moreover, data visualization of packed binary files still remains a field that is relatively unexplored. In addition, other binary formats than the traditional Windows Portable Executable are very few addressed in the current scientific literature. This master thesis proposes to address all these issues at once by taking advantage of automation and containerization to build an open source platform with an experimental toolkit especially tailored to executable packing, aptly called the Packing Box, that aims to regroup functionalities based on four axes: (1) the integration of open source and freely available detectors, packers and unpackers (and many other resources like utilities to assist analysis), (2) the mechanics for manipulating datasets and creating unbiased ground truths, (3) the necessary toolset for extensive data visualization and (4) the complete automation of the machine learning pipeline, from feature extraction to model training, including entry points for the user to tune any possible asset that influences the process. Throughout this work, we show how to use the resulting toolset and what outcome it can bring to our analysis. We then analyze the performance of various detection tools and we finally experiment on our taxonomy of packer categories to identify relevant features that rely on many static detection techniques.