LLM Load Simulator & Token Pattern Visualizer

Business Implications

Understanding token patterns and burst behavior is essential before scaling LLM applications in production. This simulator provides early visibility into load fluctuations, allowing teams to model cost, latency, and GPU requirements more accurately. It reduces guesswork in capacity planning, enabling more reliable, cost-efficient LLM deployment strategies and forming the foundation for intelligent autoscaling research.

Check GitHub

Final
Outcome

Synthetic LLM workload insights generated.

Check GitHUB

Steps Performed

Designed a synthetic traffic model for LLM requests, generated time-series token workloads, and visualized bursts, prefills, and decodes. Exported structured CSV traces to feed into baseline and RL autoscaling experiments for cost/latency research

Designed LLM Workload Framework

Defined a synthetic inference workload structure including arrival rate, burstiness, and log-normal prompt/completion token distributions.

Implemented Request Generation Engine

Used Poisson processes to simulate real-world LLM request bursts with configurable parameters for traffic intensity and variance.

Modeled Token Behavior

Generated prompt vs decode token lengths using log-normal sampling to reflect realistic LLM input/output patterns.

Aggregated Minute-Level Metrics

Converted raw request traces into per-minute summaries for tokens, request counts, and prefill/decode breakdowns.

Produced Analytical Visualizations

Generated plots for request patterns, token dynamics, and token-length distributions—exported automatically into a structured /plots directory.

AWS Services Used

None

Python
NumPy
Pandas
Matplotlib

Technical Tools Used

Workload Simulation
Data Visualization
Statistical Modeling
ML Systems Analysis

Skills Demonstrated