Business Implications
Delivers scalable, low-ops detection of suspicious traffic with consistent, repeatable training and versioned deployment. Reduces time-to-mitigation via real-time scoring, centralizes monitoring, and enables continuous improvement through automated retraining—ideal for SOC workflows, compliance reporting, and proactive defense at cloud scale.


Steps Performed
Provisioned roles/notebook, staged public UNSW-NB15 data in S3, engineered features, trained/evaluated XGBoost, deployed an HTTPS endpoint, then automated retraining/deployment via SageMaker Pipelines and (optional) EventBridge.
1.
Preprocess Data & Feature Engineering
Created a SageMaker notebook with an IAM execution role. Pulled UNSW-NB15 logs to S3, explored class balance and schema, engineered traffic ratios/port flags/flow intensity, one-hot-encoded protocol/service/state, and standardized numerical features.
2.
Train & Test an XGBoost Model
Split processed data into train/test, exported LIBSVM for SageMaker’s built-in XGBoost, configured hyperparameters (binary:logistic, depth, rounds, subsample), launched a managed training job, and computed accuracy/F1 via a local validation script.
3.
Deploy & Serve Real-Time Inference
Registered the trained artifact as a SageMaker Model, created an EndpointConfig, and deployed a single-instance HTTPS Endpoint for near real-time scoring. Verified predictions with CSV payloads and logged results/latency in CloudWatch.
4.
Automate with SageMaker Pipelines
Defined a Pipeline with steps for data prep (from S3), training, and evaluation, and a conditional deploy step (threshold-gated). Optionally wired EventBridge + Lambda to retrigger pipeline when new data lands in S3.
5.
Connected all the parts of the project
Automated the entire machine learning workflow, including data preprocessing, training, evaluation, and deployment using Amazon SageMaker Pipelines.
AWS Services Used
Amazon SageMaker (training, hosting, pipelines)
Amazon S3 (raw/processed data, artifacts)
AWS Lambda (preprocessing automation / triggers)
Amazon CloudWatch (metrics, logs, alarms)
AWS IAM (secure roles/policies)
(Optional) Amazon EventBridge (scheduled/automated retraining)
Python (pandas, scikit-learn, xgboost, boto3)
Jupyter (SageMaker notebooks)
SQL-like analysis (optional)
AWS CLI
Technical Tools Used
Feature engineering for network telemetry
XGBoost modeling & evaluation at scale
Real-time model serving on SageMaker
Pipeline orchestration & MLOps automation
Cloud monitoring & access control (IAM)
Skills Demonstrated

Cybersecurity Threat Detection on SageMaker (AWS)
Detect anomalous network traffic with serverless ML
Built an end-to-end Amazon SageMaker pipeline that ingests network logs, engineers features, trains and evaluates an XGBoost classifier, and deploys a real-time endpoint for threat scoring. The workflow is automated with SageMaker Pipelines, integrates S3/Lambda/CloudWatch, and supports near real-time inference.






