Loading.
Back to work
CASE 05 DATA BI DUCKDB

Analytics —
Shopify BI

Production analytics platform replacing month-end spreadsheet rituals. DuckDB on Parquet, pre-warmed query cache, executive KPIs, India geo heatmap, cohort retention, RFM, Pareto and weekday patterns — fed by a nightly Shopify sync.

Role Solo build
Stack FastAPI · DuckDB · React
Data 190k+ orders · Parquet
Status Live on AWS Tokyo

Reporting was a monthly forensic project.

Every month, the same exercise: pull a fresh Shopify CSV, ferry it into a different sheet for each question (revenue, retention, cohorts, geography, payments, RFM), and stitch the answers into a deck. Half the month was over by the time anyone had the numbers from the half that ended.

A few earlier dashboards existed — fragile Google Sheets that recalculated for 20 minutes, a Streamlit prototype on a laptop tunnelled out through Cloudflare. Both worked. Neither was something the team could actually rely on.

Parquet at rest, DuckDB at query.

The bet was simple: store everything as columnar Parquet, query it with DuckDB, ship a thin FastAPI layer and a React frontend. No data warehouse, no Snowflake bill, no ETL platform. The whole stack runs on one t3.large in Tokyo.

A nightly sync.py appends new Shopify orders to Parquet. On service start, the query cache pre-warms the heaviest views so the first user of the day doesn't pay the cold-start tax.

190k+ Orders queried
Nightly Incremental sync
<200ms Warm query latency
SHOPIFY API
Nightly cron
sync.py
Append to Parquet
DUCKDB
Pre-warmed cache
FASTAPI
JSON endpoints
Dashboard
React · Vite

Eight views, one source of truth.

Each module reads from the same Parquet snapshot. No hidden joins, no second copy of the data, no "this number doesn't match the other dashboard."

  • Executive KPIs — revenue, AOV, units, repeat-customer rate, returning-revenue share, with deltas vs prior period.
  • Revenue trends — daily/weekly/monthly with channel splits and rolling averages.
  • India geography heatmap — state and pincode-level performance overlaid on real coordinates.
  • Payments — COD vs prepaid split, gateway breakdown, GoKwik reconciliation.
  • Product performance — SKU sales, returns rate, combo SKUs decomposed to components.
  • Cohort retention — triangular matrix, ageing windows, retention curves by acquisition month.
  • RFM segmentation — recency, frequency, monetary scoring and named segments.
  • Weekday patterns — hour × day heatmaps revealing dispatch and demand cycles.

From a laptop and a tunnel to AWS Tokyo.

Earlier life: Streamlit on a personal laptop, exposed via a Cloudflare tunnel. The day the laptop closed, the dashboard died. Migration target: treat the dashboard like the production system it had quietly become.

Uptime
0%
Pages loaded
8 modules
Daily backup
Automated
SSO
Workspace
FastAPI DuckDB Parquet React Vite Shopify API AWS EC2 t3.large Caddy systemd Google Workspace SSO Cloudflare DNS

Numbers, on demand.

Leadership opens the dashboard, picks a date range, gets answers in milliseconds. The monthly reporting forensics ended. The pincode RTO and combo-affinity work below were only feasible because this layer existed first — they piggyback on the same Parquet snapshots and DuckDB engine.

Next case · 06 Pincode RTO Pipeline