Projects

Real systems.
Real data. Real impact.

A collection of projects across analytics engineering, data platforms, AI experimentation, quantitative systems, and production-style automation — designed to show end-to-end technical thinking, not just isolated code.

How It Works

From raw input to decision-ready systems.

Across the strongest projects, the core pattern stays consistent: source data is collected, validated, transformed, modeled, and then exposed through dashboards, analytics applications, or domain-specific outputs.

01 · Sources

Source Systems

APIs, historical datasets, event streams, web sources, or synthetic environments provide the raw input layer.

02 · Ingestion

Collection & Validation

Python ingestion, extraction logic, scraping, and schema-aware validation turn raw input into reliable structured data.

03 · Processing

Transformation Layer

Spark, SQL, business logic, or model pipelines clean, normalize, enrich, and reshape the data into useful layers.

04 · Storage

Warehouse / Structured Layer

Fact-like tables, dimensions, marts, or analytics-ready structures support scalable querying and repeatable reporting.

05 · Consumption

BI / Application Output

Dashboards, live demos, analytics products, and domain outputs turn the system into something decision-makers can use.

Flagship Project

Formula One
Analytics Platform

History Covered
75 Years of F1 Data
Architecture
5-Layer Pipeline
Core Stack
Python · Spark · Snowflake

A full analytics platform designed around the complete data lifecycle: raw ingestion, distributed transformation, warehouse-ready modeling, and insight delivery. Rather than focusing only on analysis, this project was built to demonstrate platform thinking — how historical sports data can be transformed into a scalable analytics system with clear separation between ingestion, processing, storage, and BI consumption.

System story

This project is structured like a modern analytics platform: historical motorsport data enters through an ingestion layer, is processed through transformation logic, organized into analytics-ready structures, and finally delivered through reporting and interactive exploration layers.

How it works

Sources
Race & Historical Data

Race results, driver records, constructors, lap data, qualifying data, and season-level history.

Ingestion
Python ETL Layer

Raw inputs are collected, validated, typed, and standardized before downstream processing begins.

Processing
Spark / Databricks

Distributed transforms clean, normalize, enrich, and reshape historical records into reliable analytical layers.

Warehouse
Structured Modeling

Analytics-ready tables and marts support season comparisons, standings, trend analysis, and reporting queries.

Consumption
BI + Live Demo

Dashboards and the interactive Streamlit application expose the platform to analysts, users, and stakeholders.

End-to-end architecture

Source datasets → Python ingestion layer → Spark / Databricks transformation layer → cleaned analytical tables → warehouse-style modeling → dashboards + Streamlit application layer

Engineering focus

Built around modular ETL logic, schema-aware transformations, historical normalization, distributed processing, warehouse-ready marts, and stakeholder-facing analytics delivery. This is designed to show not just data analysis, but the system thinking behind a real analytics workflow.

Business / user output

The final system supports historical comparison, performance trend analysis, standings interpretation, and interactive exploration through dashboards and a live product-style demo — making the pipeline visible from source to consumer.

Why it matters

Shows capability across data engineering, transformation design, pipeline structure, warehouse thinking, dashboarding, and productized analytics — not just notebook-level exploration.

Python Apache Spark Databricks Snowflake R Looker Studio ETL Data Modeling Streamlit

Live Interactive Demo

Interactive F1 Analytics Platform

Explore the platform end to end — ingestion layer, Spark transforms, warehouse model, BI reporting, driver analytics, constructor intelligence, and lap-time performance.

🏎
Analytics Engineering

GA4 Analytics
Dashboard Pipeline

An end-to-end analytics pipeline built to transform raw GA4 event data into structured, decision-ready dashboards. The project focuses on event collection, metric logic, transformation layers, and reporting outputs that make product and marketing performance easier to interpret.

System story

This project follows a clean analytics engineering flow: event data is extracted from the source layer, transformed into reporting logic and metric structures, and then delivered to dashboard consumers in a consistent and repeatable way.

How it works

Source
GA4 Events

Raw user behavior, event tracking, and traffic data enter from the analytics layer.

Transform
Metric Logic

Events are cleaned, grouped, and translated into usable KPIs, dimensions, and reporting structures.

Consumption
Dashboards

Looker Studio surfaces the transformed outputs for product, performance, and marketing analysis.

End-to-end architecture

GA4 event source → Python extraction / API handling → cleaning + metric transformation → analytics-ready reporting tables → Looker Studio dashboard consumption

Engineering focus

Emphasizes analytics engineering fundamentals: metric definition, event standardization, reporting consistency, transformation logic, and dashboard usability for non-technical stakeholders.

Why it matters

Demonstrates how raw behavioral data becomes business-facing insights through a repeatable pipeline rather than ad hoc reporting.

GA4 Python Looker Studio SQL Analytics Engineering
View on GitHub
📈
Applied AI

AI Studio —
RL Environment

A reinforcement learning experimentation environment designed for training agents under custom reward structures, environment rules, and iterative training workflows.

What it does

Defines an environment, state-action behavior, reward mechanics, and model training loops that allow an agent to learn through interaction and repeated policy improvement.

End-to-end architecture

Environment design → state representation → reward logic → agent training loop → evaluation runs → performance observation and iteration

Engineering focus

Focuses on experimentation design, reward shaping, training stability, iterative testing, and comparing behavior under different learning configurations.

Why it matters

Shows practical exposure to ML system design, learning dynamics, and the tradeoffs involved when building AI environments from scratch.

PyTorch Reinforcement Learning Python TensorFlow Experimentation
View on GitHub
🧠
Quantitative Systems

High-Frequency
Trading Simulation

A Python-based simulation system for testing trading behavior in synthetic market conditions.

What it does

Simulates market movement, evaluates rule-based or modeled trading decisions, and measures behavior under different timing and execution assumptions.

End-to-end architecture

Synthetic market generator → pricing / signal logic → trade execution simulation → latency-aware evaluation → strategy performance analysis

Engineering focus

Centers on simulation design, algorithmic thinking, numerical analysis, and performance evaluation in systems where speed, timing, and sequential decisions influence outcomes.

Why it matters

Demonstrates strong problem-solving ability in quantitative environments and highlights comfort with logic-heavy, performance-oriented Python systems.

Python Quantitative Modeling Algorithm Design NumPy Simulation
View on GitHub
📉
Big Data

Movie Data Platform
on Databricks

A distributed data processing project built on Databricks and Apache Spark to handle ingestion, transformation, cleaning, and large-scale analysis of movie-related datasets.

What it does

Ingests large datasets, applies Spark-based transformations, handles cleaning and reshaping, and prepares structured outputs for analytics or downstream querying.

End-to-end architecture

Raw movie datasets → Databricks ingestion → PySpark transformation jobs → cleaned distributed data layers → SQL analysis / reporting outputs

Engineering focus

Focuses on distributed compute, scalable transformations, Spark workflows, notebook-driven processing, and handling larger datasets more efficiently than local-only analysis.

Why it matters

Shows readiness for modern data platform environments where large-scale transformation and cloud-style workflows are essential.

Databricks Apache Spark PySpark SQL Big Data
View on GitHub
🎬
Data Ingestion

Automated
Data Mining System

A structured scraping and ingestion pipeline built for collecting, validating, and preparing data from external sources for downstream analytics use.

What it does

Extracts structured information from source pages, applies validation and cleaning steps, and organizes the output into usable formats for analytics, storage, or later transformation.

End-to-end architecture

External web sources → scraping layer → parsing + validation → cleaned structured records → storage / analytics-ready datasets

Engineering focus

Built around reliability in data collection, parsing logic, automation flow, data quality checks, and making raw extracted information usable for downstream systems.

Why it matters

Demonstrates a strong understanding of ingestion pipelines, source handling, and the early stages of the data engineering lifecycle where reliability often matters most.

Python Web Scraping Data Pipelines Automation Data Validation
View on GitHub
🕸