top of page
sophia-logo.png
  • Linkedin

Business Implications

The pipeline turns raw CSV drops into governed, analytics-ready data with zero servers to manage. Teams get faster time-to-insight, reliable data quality, and low operational cost. QuickSight dashboards update as new files arrive, enabling near-real-time decisions for business stakeholders.

Final
Outcome

Automated Curated Data & Dashboard

Steps Performed

Automated CSV ingestion via S3 events, preprocessed with Lambda, transformed at scale using Glue, stored curated datasets in S3, and built interactive QuickSight dashboards for analysis.

1.

Set Up Buckets And Roles

Created three S3 buckets (raw, processed, final). Configured IAM roles/policies for Lambda and Glue with least-privilege access to S3, Glue, and CloudWatch. Enabled QuickSight access to selected buckets for analytics.

2.

Lambda Preprocess On Upload

Wrote a Python Lambda to trigger on raw-bucket uploads, read CSV, filter/clean rows, and write clean outputs to the processed bucket using Boto3, logging execution in CloudWatch.

3.

Catalog And Discover Schema

Set up Glue Data Catalog database, created an on-demand Crawler to infer schema from processed CSVs, and materialized tables for downstream ETL and analytics consumption.

4.

Transform With Glue ETL

Built a Glue Studio (Visual ETL) job to apply schema changes and business rules, write compressed CSV outputs to the final bucket, and parameterize paths for repeatable production runs.

5.

Visualize In QuickSight

Connected QuickSight to final S3 data via a manifest, created bar/line visuals, and published a dashboard. Validated dynamic refresh by dropping new CSVs through the pipeline.

AWS Services Used

Amazon S3
AWS Lambda
AWS Glue
Amazon QuickSight
AWS IAM
Amazon CloudWatch

Python
Boto3
AWS Glue Studio
Amazon QuickSight (authoring)

Technical Tools Used

Serverless Data Engineering
Event-Driven ETL Orchestration
Data Modeling & Governance
Dashboarding & Analytics

Skills Demonstrated

Serverless CSV Data Pipeline - ETL

Event-Driven ETL With S3, Lambda, Glue, QuickSight

Built a serverless pipeline that ingests CSVs to S3, triggers Lambda to preprocess, runs scalable AWS Glue ETL, lands curated data back in S3, and visualizes insights in Amazon QuickSight. The workflow is secure, pay-as-you-go, and fully automated end-to-end.

Related Projects

CI/CD For Dockerized 2048 Game

CI/CD For Dockerized 2048 Game

Amazon ECS

Multi-Cloud Weather Tracker with DR (AWS+Azure)

Multi-Cloud Weather Tracker with DR (AWS+Azure)

Azure+AWS

Amazon Polly Text Narrator

Amazon Polly Text Narrator

Amazon Polly

Automated Receipt Processing System - Amazon Textract

Automated Receipt Processing System - Amazon Textract

Amazon Textract

AWS Serverless Event Announcement System

AWS Serverless Event Announcement System

AWS Lambda

Two-Tier To-Do App on AWS

Two-Tier To-Do App on AWS

Amazon EC2

bottom of page