π¦ GT Malware Classifier
Machine learning-based PE file malware classification with feature extraction and real-time API deployment.
π Overview
GT Malware Classifier is a machine learning tool designed to analyze and classify Portable Executable (PE) files as either benign (goodware) or malicious (malware).
It leverages advanced feature extraction with the LIEF
library and robust machine learning models to provide efficient, scalable malware detection.
The project also includes a RESTful API for real-time analysis and classification.
This project was inspired by and based on the ideas presented in the 2021 Machine Learning Security Evasion Competition.
π GitHub Repository: GT Malware Classifier
π οΈ Technologies Used
- Feature Extraction: Python (
LIEF
library) - Machine Learning Models: Random Forest, Gradient Boosting (Scikit-learn)
- Backend API: Flask
- Data Handling: JSONL, CSV
- Imbalance Handling: SMOTE, Class Weights
- Utilities: Timeout Handling, Metrics Calculation
π Features
- β¨ Detailed PE file analysis (imports, exports, sections, strings)
- β¨ Custom machine learning model (
GTModel
) - β¨ Handles
.jsonl
and.csv
data formats - β¨ RESTful API deployment for real-time file classification
- β¨ Comprehensive performance metrics: accuracy, precision, recall, FPR, FNR
βοΈ Setup Instructions
- Build the Docker image:
docker build -t gt-malware-classifier .
- Run the Docker container:
docker run -p 5000:5000 gt-malware-classifier
- Access the API at
http://localhost:5000
. - Use the
/predict
endpoint to classify PE files:curl -X POST -F 'file=@path/to/your/file.exe' http://localhost:5000/predict
- The API will return a JSON response with the classification result.
πΊοΈ Workflow
Feature Extraction:
Extract attributes such as imports, exports, sections, and strings from PE files usingLIEF
.Model Training:
Train theGTModel
using labeled datasets with support for Random Forest and Gradient Boosting classifiers.Model Testing:
Evaluate the model’s performance using a testing set and output detailed metrics.API Deployment:
Serve the trained model through a Flask API that accepts file uploads for real-time classification.Data Analysis:
Analyze feature distributions and detect common malware patterns.
π€ Contributing
Open to improvements!
Pull requests and suggestions are welcome at GT Malware Classifier GitHub.
π License
Distributed under the MIT License.