top of page
sophia-logo.png
  • Linkedin

Business Implications

Delivers scalable, low-ops detection of suspicious traffic with consistent, repeatable training and versioned deployment. Reduces time-to-mitigation via real-time scoring, centralizes monitoring, and enables continuous improvement through automated retraining—ideal for SOC workflows, compliance reporting, and proactive defense at cloud scale.

Final
Outcome

Automated threat detection ML service

Steps Performed

Provisioned roles/notebook, staged public UNSW-NB15 data in S3, engineered features, trained/evaluated XGBoost, deployed an HTTPS endpoint, then automated retraining/deployment via SageMaker Pipelines and (optional) EventBridge.

1.

Preprocess Data & Feature Engineering

Created a SageMaker notebook with an IAM execution role. Pulled UNSW-NB15 logs to S3, explored class balance and schema, engineered traffic ratios/port flags/flow intensity, one-hot-encoded protocol/service/state, and standardized numerical features.

2.

Train & Test an XGBoost Model

Split processed data into train/test, exported LIBSVM for SageMaker’s built-in XGBoost, configured hyperparameters (binary:logistic, depth, rounds, subsample), launched a managed training job, and computed accuracy/F1 via a local validation script.

3.

Deploy & Serve Real-Time Inference

Registered the trained artifact as a SageMaker Model, created an EndpointConfig, and deployed a single-instance HTTPS Endpoint for near real-time scoring. Verified predictions with CSV payloads and logged results/latency in CloudWatch.

4.

Automate with SageMaker Pipelines

Defined a Pipeline with steps for data prep (from S3), training, and evaluation, and a conditional deploy step (threshold-gated). Optionally wired EventBridge + Lambda to retrigger pipeline when new data lands in S3.

5.

Connected all the parts of the project

Automated the entire machine learning workflow, including data preprocessing, training, evaluation, and deployment using Amazon SageMaker Pipelines.

AWS Services Used

Amazon SageMaker (training, hosting, pipelines)
Amazon S3 (raw/processed data, artifacts)
AWS Lambda (preprocessing automation / triggers)
Amazon CloudWatch (metrics, logs, alarms)
AWS IAM (secure roles/policies)
(Optional) Amazon EventBridge (scheduled/automated retraining)

Python (pandas, scikit-learn, xgboost, boto3)
Jupyter (SageMaker notebooks)
SQL-like analysis (optional)
AWS CLI

Technical Tools Used

Feature engineering for network telemetry
XGBoost modeling & evaluation at scale
Real-time model serving on SageMaker
Pipeline orchestration & MLOps automation
Cloud monitoring & access control (IAM)

Skills Demonstrated

Cybersecurity Threat Detection on SageMaker (AWS)

Detect anomalous network traffic with serverless ML

Built an end-to-end Amazon SageMaker pipeline that ingests network logs, engineers features, trains and evaluates an XGBoost classifier, and deploys a real-time endpoint for threat scoring. The workflow is automated with SageMaker Pipelines, integrates S3/Lambda/CloudWatch, and supports near real-time inference.

Related Projects

CI/CD For Dockerized 2048 Game

CI/CD For Dockerized 2048 Game

Amazon ECS

Multi-Cloud Weather Tracker with DR (AWS+Azure)

Multi-Cloud Weather Tracker with DR (AWS+Azure)

Azure+AWS

Amazon Polly Text Narrator

Amazon Polly Text Narrator

Amazon Polly

Automated Receipt Processing System - Amazon Textract

Automated Receipt Processing System - Amazon Textract

Amazon Textract

Reinforcement Learning Auto-Scaler for LLM Inference

Reinforcement Learning Auto-Scaler for LLM Inference

RL-Based LLM Autoscaler

AWS Serverless Event Announcement System

AWS Serverless Event Announcement System

AWS Lambda

bottom of page