📦 GT Malware Classifier

Machine learning-based PE file malware classification with feature extraction and real-time API deployment.

📖 Overview

GT Malware Classifier is a machine learning tool designed to analyze and classify Portable Executable (PE) files as either benign (goodware) or malicious (malware).
It leverages advanced feature extraction with the LIEF library and robust machine learning models to provide efficient, scalable malware detection.
The project also includes a RESTful API for real-time analysis and classification.

This project was inspired by and based on the ideas presented in the 2021 Machine Learning Security Evasion Competition.

🔗 GitHub Repository: GT Malware Classifier

🛠️ Technologies Used

Feature Extraction: Python (LIEF library)
Machine Learning Models: Random Forest, Gradient Boosting (Scikit-learn)
Backend API: Flask
Data Handling: JSONL, CSV
Imbalance Handling: SMOTE, Class Weights
Utilities: Timeout Handling, Metrics Calculation

🚀 Features

✨ Detailed PE file analysis (imports, exports, sections, strings)
✨ Custom machine learning model (GTModel)
✨ Handles .jsonl and .csv data formats
✨ RESTful API deployment for real-time file classification
✨ Comprehensive performance metrics: accuracy, precision, recall, FPR, FNR

⚙️ Setup Instructions

Build the Docker image:

docker build -t gt-malware-classifier .

Run the Docker container:

 docker run -p 5000:5000 gt-malware-classifier

Access the API at http://localhost:5000.

Use the /predict endpoint to classify PE files:

curl -X POST -F 'file=@path/to/your/file.exe' http://localhost:5000/predict

The API will return a JSON response with the classification result.

🗺️ Workflow

Feature Extraction:
Extract attributes such as imports, exports, sections, and strings from PE files using LIEF.
Model Training:
Train the GTModel using labeled datasets with support for Random Forest and Gradient Boosting classifiers.
Model Testing:
Evaluate the model’s performance using a testing set and output detailed metrics.
API Deployment:
Serve the trained model through a Flask API that accepts file uploads for real-time classification.
Data Analysis:
Analyze feature distributions and detect common malware patterns.

🤝 Contributing

Open to improvements!
Pull requests and suggestions are welcome at GT Malware Classifier GitHub.

📄 License

Distributed under the MIT License.

📦 GT Malware Classifier#

📖 Overview#

🛠️ Technologies Used#

🚀 Features#

⚙️ Setup Instructions#

🗺️ Workflow#

🤝 Contributing#

📄 License#