Back to Work & Research
Project
completed
2026

KKBox Real-time Customer Churn BI Dashboard

UEH Business Intelligence course project with streaming analytics and decision support

Built a near real-time Business Intelligence system for KKBox churn monitoring and retention decision support. The project combines Kafka log replay, Spark Structured Streaming, ClickHouse OLAP storage, FastAPI APIs, and a React dashboard to deliver descriptive, predictive-proxy, and prescriptive analysis in one workflow.

Business Intelligence
Data Engineering
Streaming Analytics
BI Engineer
Full-stack Developer
KKBox Real-time Customer Churn BI Dashboard

Timeline

2026

Type

Project

Status

completed

Outcome / Impact

  • Delivered a 3-tab decision-support dashboard spanning descriptive analysis, predictive-proxy scoring, and prescriptive scenario simulation
  • Built end-to-end near real-time data flow: replay logs -> Kafka -> Spark Structured Streaming -> ClickHouse -> FastAPI -> React dashboard
  • Smoke-tested the artifact pipeline across 906,157-row descriptive snapshot, 887,123-row validation set, and 907,471-row scored test/simulation outputs
  • Deployed a live dashboard on Vercel and documented the stack for local/demo execution with Docker Compose

Tech / Skills

Kafka
Spark Streaming
ClickHouse
FastAPI
React
Vite
Docker
Churn Analytics

Project Media

Demo video and visual walkthrough for this project.

Project Screenshots

Case Study

1) Context / Problem

Subscription businesses need fast visibility into churn risk, revenue at risk, and which retention actions are worth funding. For the UEH BI course project, the objective was to move beyond static reporting and build a near real-time decision-support system that could ingest behavioral data, surface operational KPIs, and simulate retention strategies.

2) Your Role

As a core contributor, I worked on the feature-store-first BI workflow, dashboard productization, and the integration between the analytics pipeline and the serving layer. My work focused on turning batch and streaming artifacts into a usable business-facing dashboard across the three analysis tabs.

3) Approach

Designed the stack around replayed user logs and low-latency analytics: batch preload plus Kafka event replay feed Spark Structured Streaming, aggregated results land in ClickHouse, and FastAPI serves the dashboard to a React/Vite frontend. The product is organized into Tab 1 descriptive KPIs and survival analysis, Tab 2 predictive-proxy risk scoring, and Tab 3 prescriptive simulation with ROI-style scenario comparison.

4) Result / Impact

Delivered an end-to-end BI demo with working descriptive, predictive-proxy, and prescriptive views, plus a live dashboard deployment. The project successfully passed smoke tests over the canonical artifact chain, including 906,157-row descriptive snapshot data and 907,471-row scoring/simulation outputs.

5) Learnings

Near real-time BI depends as much on semantic consistency as on streaming infrastructure. Feature-store boundaries, serving-layer definitions, and honest model maturity labeling matter; predictive proxy metrics are useful for experimentation, but they should not be presented as fully productionized ML serving.

6) Links

See links above.