Flagship Project
Formula One
Analytics Platform
A full analytics platform designed around the complete data lifecycle: raw ingestion, distributed transformation,
warehouse-ready modeling, and insight delivery. Rather than focusing only on analysis, this project was built to
demonstrate platform thinking — how historical sports data can be transformed into a scalable analytics system
with clear separation between ingestion, processing, storage, and BI consumption.
What it does
Processes decades of Formula One race, driver, constructor, lap, and results data into analysis-ready datasets
for historical comparison, performance trend evaluation, and dashboard-based storytelling.
End-to-end architecture
Source datasets → Python ingestion layer → Spark/Databricks transformation layer →
cleaned analytical tables → Snowflake-style warehouse modeling → BI dashboards / reporting layer
Engineering focus
Built around modular ETL logic, schema-aware transformations, historical normalization, distributed processing,
and analytics-friendly data modeling. The project reflects how a real analytics engineering workflow moves from
messy source data toward reliable reporting outputs.
Why it matters
Shows capability across data engineering, transformation design, pipeline structure, warehouse thinking, and
stakeholder-ready analytics — not just notebook-level exploration.
Python
Apache Spark
Databricks
Snowflake
R
Looker Studio
ETL
Data Modeling
View on GitHub
🏎
Analytics Engineering
GA4 Analytics
Dashboard Pipeline
An end-to-end analytics pipeline built to transform raw GA4 event data into structured, decision-ready dashboards.
The project focuses on event collection, metric logic, transformation layers, and reporting outputs that make
product and marketing performance easier to interpret.
What it does
Pulls analytics data, organizes key events and user behavior patterns, applies transformation logic, and exposes
performance metrics through dashboard layers for easier monitoring and reporting.
End-to-end architecture
GA4 event source → Python extraction / API handling → cleaning + metric transformation →
SQL-ready tables / reporting structures → Looker Studio dashboards
Engineering focus
Emphasizes analytics engineering fundamentals: metric definition, event standardization, reporting consistency,
transformation logic, and dashboard usability for non-technical stakeholders.
Why it matters
Demonstrates how raw behavioral data becomes business-facing insights through a repeatable pipeline rather than
ad hoc reporting.
GA4
Python
Looker Studio
SQL
Analytics Engineering
View on GitHub
📈
Applied AI
AI Studio —
RL Environment
A reinforcement learning experimentation environment designed for training agents under custom reward structures,
environment rules, and iterative training workflows. Built to explore decision-making systems rather than static
supervised learning only.
What it does
Defines an environment, state-action behavior, reward mechanics, and model training loops that allow an agent
to learn through interaction and repeated policy improvement.
End-to-end architecture
Environment design → state representation → reward logic → agent training loop →
evaluation runs → performance observation and iteration
Engineering focus
Focuses on experimentation design, reward shaping, training stability, iterative testing, and comparing behavior
under different learning configurations. It reflects an AI builder mindset rather than using prepackaged models blindly.
Why it matters
Shows practical exposure to ML system design, learning dynamics, and the tradeoffs involved when building AI
environments from scratch.
PyTorch
Reinforcement Learning
Python
TensorFlow
Experimentation
View on GitHub
🧠
Quantitative Systems
High-Frequency
Trading Simulation
A Python-based simulation system for testing trading behavior in synthetic market conditions. The project explores
latency-sensitive logic, order behavior, market reactions, and how trading strategies perform under fast-changing inputs.
What it does
Simulates market movement, evaluates rule-based or modeled trading decisions, and measures behavior under different
timing and execution assumptions.
End-to-end architecture
Synthetic market generator → pricing / signal logic → trade execution simulation →
latency-aware evaluation → strategy performance analysis
Engineering focus
Centers on simulation design, algorithmic thinking, numerical analysis, and performance evaluation in systems where
speed, timing, and sequential decisions influence outcomes.
Why it matters
Demonstrates strong problem-solving ability in quantitative environments and highlights comfort with logic-heavy,
performance-oriented Python systems.
Python
Quantitative Modeling
Algorithm Design
NumPy
Simulation
View on GitHub
📉
Big Data
Movie Data Platform
on Databricks
A distributed data processing project built on Databricks and Apache Spark to handle ingestion, transformation,
cleaning, and large-scale analysis of movie-related datasets. Designed to reflect modern big data processing patterns.
What it does
Ingests large datasets, applies Spark-based transformations, handles cleaning and reshaping, and prepares structured
outputs for analytics or downstream querying.
End-to-end architecture
Raw movie datasets → Databricks ingestion → PySpark transformation jobs →
cleaned distributed data layers → SQL analysis / reporting outputs
Engineering focus
Focuses on distributed compute, scalable transformations, Spark workflows, notebook-driven processing, and handling
larger datasets more efficiently than local-only analysis.
Why it matters
Shows readiness for modern data platform environments where large-scale transformation and cloud-style workflows
are essential.
Databricks
Apache Spark
PySpark
SQL
Big Data
View on GitHub
🎬
Data Ingestion
Automated
Data Mining System
A structured scraping and ingestion pipeline built for collecting, validating, and preparing data from external
sources for downstream analytics use. The system emphasizes automation, repeatability, and data readiness.
What it does
Extracts structured information from source pages, applies validation and cleaning steps, and organizes the output
into usable formats for analytics, storage, or later transformation.
End-to-end architecture
External web sources → scraping layer → parsing + validation →
cleaned structured records → storage / analytics-ready datasets
Engineering focus
Built around reliability in data collection, parsing logic, automation flow, data quality checks, and making raw
extracted information usable for downstream systems.
Why it matters
Demonstrates a strong understanding of ingestion pipelines, source handling, and the early stages of the data
engineering lifecycle where reliability often matters most.
Python
Web Scraping
Data Pipelines
Automation
Data Validation
View on GitHub
🕸