Business Implications


Steps Performed
Automated CSV ingestion via S3 events, preprocessed with Lambda, transformed at scale using Glue, stored curated datasets in S3, and built interactive QuickSight dashboards for analysis.
1.
Set Up Buckets And Roles
Created three S3 buckets (raw, processed, final). Configured IAM roles/policies for Lambda and Glue with least-privilege access to S3, Glue, and CloudWatch. Enabled QuickSight access to selected buckets for analytics.
2.
Lambda Preprocess On Upload
Wrote a Python Lambda to trigger on raw-bucket uploads, read CSV, filter/clean rows, and write clean outputs to the processed bucket using Boto3, logging execution in CloudWatch.
3.
Catalog And Discover Schema
Set up Glue Data Catalog database, created an on-demand Crawler to infer schema from processed CSVs, and materialized tables for downstream ETL and analytics consumption.
4.
Transform With Glue ETL
Built a Glue Studio (Visual ETL) job to apply schema changes and business rules, write compressed CSV outputs to the final bucket, and parameterize paths for repeatable production runs.
5.
Visualize In QuickSight
Connected QuickSight to final S3 data via a manifest, created bar/line visuals, and published a dashboard. Validated dynamic refresh by dropping new CSVs through the pipeline.
AWS Services Used
Amazon S3
AWS Lambda
AWS Glue
Amazon QuickSight
AWS IAM
Amazon CloudWatch
Python
Boto3
AWS Glue Studio
Amazon QuickSight (authoring)
Technical Tools Used
Serverless Data Engineering
Event-Driven ETL Orchestration
Data Modeling & Governance
Dashboarding & Analytics
Skills Demonstrated

Serverless CSV Data Pipeline - ETL
Event-Driven ETL With S3, Lambda, Glue, QuickSight
Built a serverless pipeline that ingests CSVs to S3, triggers Lambda to preprocess, runs scalable AWS Glue ETL, lands curated data back in S3, and visualizes insights in Amazon QuickSight. The workflow is secure, pay-as-you-go, and fully automated end-to-end.






