Flagship Project
Formula One
Analytics Platform
A full analytics platform designed around the complete data lifecycle: raw ingestion, distributed transformation,
warehouse-ready modeling, and insight delivery. Rather than focusing only on analysis, this project was built to
demonstrate platform thinking — how historical sports data can be transformed into a scalable analytics system
with clear separation between ingestion, processing, storage, and BI consumption.
System story
This project is structured like a modern analytics platform: historical motorsport data enters through an ingestion
layer, is processed through transformation logic, organized into analytics-ready structures, and finally delivered
through reporting and interactive exploration layers.
How it works
Sources
Race & Historical Data
Race results, driver records, constructors, lap data, qualifying data, and season-level history.
Ingestion
Python ETL Layer
Raw inputs are collected, validated, typed, and standardized before downstream processing begins.
Processing
Spark / Databricks
Distributed transforms clean, normalize, enrich, and reshape historical records into reliable analytical layers.
Warehouse
Structured Modeling
Analytics-ready tables and marts support season comparisons, standings, trend analysis, and reporting queries.
Consumption
BI + Live Demo
Dashboards and the interactive Streamlit application expose the platform to analysts, users, and stakeholders.
End-to-end architecture
Source datasets → Python ingestion layer → Spark / Databricks transformation layer →
cleaned analytical tables → warehouse-style modeling → dashboards + Streamlit application layer
Engineering focus
Built around modular ETL logic, schema-aware transformations, historical normalization, distributed processing,
warehouse-ready marts, and stakeholder-facing analytics delivery. This is designed to show not just data analysis,
but the system thinking behind a real analytics workflow.
Business / user output
The final system supports historical comparison, performance trend analysis, standings interpretation, and interactive
exploration through dashboards and a live product-style demo — making the pipeline visible from source to consumer.
Why it matters
Shows capability across data engineering, transformation design, pipeline structure, warehouse thinking,
dashboarding, and productized analytics — not just notebook-level exploration.
Python
Apache Spark
Databricks
Snowflake
R
Looker Studio
ETL
Data Modeling
Streamlit
Live Interactive Demo
Interactive F1 Analytics Platform
Explore the platform end to end — ingestion layer, Spark transforms, warehouse model,
BI reporting, driver analytics, constructor intelligence, and lap-time performance.
🏎
Analytics Engineering
GA4 Analytics
Dashboard Pipeline
An end-to-end analytics pipeline built to transform raw GA4 event data into structured, decision-ready dashboards.
The project focuses on event collection, metric logic, transformation layers, and reporting outputs that make
product and marketing performance easier to interpret.
System story
This project follows a clean analytics engineering flow: event data is extracted from the source layer, transformed
into reporting logic and metric structures, and then delivered to dashboard consumers in a consistent and repeatable way.
How it works
Source
GA4 Events
Raw user behavior, event tracking, and traffic data enter from the analytics layer.
Transform
Metric Logic
Events are cleaned, grouped, and translated into usable KPIs, dimensions, and reporting structures.
Consumption
Dashboards
Looker Studio surfaces the transformed outputs for product, performance, and marketing analysis.
End-to-end architecture
GA4 event source → Python extraction / API handling → cleaning + metric transformation →
analytics-ready reporting tables → Looker Studio dashboard consumption
Engineering focus
Emphasizes analytics engineering fundamentals: metric definition, event standardization, reporting consistency,
transformation logic, and dashboard usability for non-technical stakeholders.
Why it matters
Demonstrates how raw behavioral data becomes business-facing insights through a repeatable pipeline rather than
ad hoc reporting.
GA4
Python
Looker Studio
SQL
Analytics Engineering
View on GitHub
📈
Applied AI
AI Studio —
RL Environment
A reinforcement learning experimentation environment designed for training agents under custom reward structures,
environment rules, and iterative training workflows.
What it does
Defines an environment, state-action behavior, reward mechanics, and model training loops that allow an agent
to learn through interaction and repeated policy improvement.
End-to-end architecture
Environment design → state representation → reward logic → agent training loop →
evaluation runs → performance observation and iteration
Engineering focus
Focuses on experimentation design, reward shaping, training stability, iterative testing, and comparing behavior
under different learning configurations.
Why it matters
Shows practical exposure to ML system design, learning dynamics, and the tradeoffs involved when building AI
environments from scratch.
PyTorch
Reinforcement Learning
Python
TensorFlow
Experimentation
View on GitHub
🧠
Quantitative Systems
High-Frequency
Trading Simulation
A Python-based simulation system for testing trading behavior in synthetic market conditions.
What it does
Simulates market movement, evaluates rule-based or modeled trading decisions, and measures behavior under different
timing and execution assumptions.
End-to-end architecture
Synthetic market generator → pricing / signal logic → trade execution simulation →
latency-aware evaluation → strategy performance analysis
Engineering focus
Centers on simulation design, algorithmic thinking, numerical analysis, and performance evaluation in systems where
speed, timing, and sequential decisions influence outcomes.
Why it matters
Demonstrates strong problem-solving ability in quantitative environments and highlights comfort with logic-heavy,
performance-oriented Python systems.
Python
Quantitative Modeling
Algorithm Design
NumPy
Simulation
View on GitHub
📉
Big Data
Movie Data Platform
on Databricks
A distributed data processing project built on Databricks and Apache Spark to handle ingestion, transformation,
cleaning, and large-scale analysis of movie-related datasets.
What it does
Ingests large datasets, applies Spark-based transformations, handles cleaning and reshaping, and prepares structured
outputs for analytics or downstream querying.
End-to-end architecture
Raw movie datasets → Databricks ingestion → PySpark transformation jobs →
cleaned distributed data layers → SQL analysis / reporting outputs
Engineering focus
Focuses on distributed compute, scalable transformations, Spark workflows, notebook-driven processing, and handling
larger datasets more efficiently than local-only analysis.
Why it matters
Shows readiness for modern data platform environments where large-scale transformation and cloud-style workflows
are essential.
Databricks
Apache Spark
PySpark
SQL
Big Data
View on GitHub
🎬
Data Ingestion
Automated
Data Mining System
A structured scraping and ingestion pipeline built for collecting, validating, and preparing data from external
sources for downstream analytics use.
What it does
Extracts structured information from source pages, applies validation and cleaning steps, and organizes the output
into usable formats for analytics, storage, or later transformation.
End-to-end architecture
External web sources → scraping layer → parsing + validation →
cleaned structured records → storage / analytics-ready datasets
Engineering focus
Built around reliability in data collection, parsing logic, automation flow, data quality checks, and making raw
extracted information usable for downstream systems.
Why it matters
Demonstrates a strong understanding of ingestion pipelines, source handling, and the early stages of the data
engineering lifecycle where reliability often matters most.
Python
Web Scraping
Data Pipelines
Automation
Data Validation
View on GitHub
🕸