Reporting was a monthly forensic project.
Every month, the same exercise: pull a fresh Shopify CSV, ferry it into a different sheet for each question (revenue, retention, cohorts, geography, payments, RFM), and stitch the answers into a deck. Half the month was over by the time anyone had the numbers from the half that ended.
A few earlier dashboards existed — fragile Google Sheets that recalculated for 20 minutes, a Streamlit prototype on a laptop tunnelled out through Cloudflare. Both worked. Neither was something the team could actually rely on.
Parquet at rest, DuckDB at query.
The bet was simple: store everything as columnar Parquet, query it with DuckDB, ship a thin FastAPI layer and a React frontend. No data warehouse, no Snowflake bill, no ETL platform. The whole stack runs on one t3.large in Tokyo.
A nightly sync.py appends new Shopify orders to Parquet. On service start, the query cache pre-warms the heaviest views so the first user of the day doesn't pay the cold-start tax.
Eight views, one source of truth.
Each module reads from the same Parquet snapshot. No hidden joins, no second copy of the data, no "this number doesn't match the other dashboard."
- Executive KPIs — revenue, AOV, units, repeat-customer rate, returning-revenue share, with deltas vs prior period.
- Revenue trends — daily/weekly/monthly with channel splits and rolling averages.
- India geography heatmap — state and pincode-level performance overlaid on real coordinates.
- Payments — COD vs prepaid split, gateway breakdown, GoKwik reconciliation.
- Product performance — SKU sales, returns rate, combo SKUs decomposed to components.
- Cohort retention — triangular matrix, ageing windows, retention curves by acquisition month.
- RFM segmentation — recency, frequency, monetary scoring and named segments.
- Weekday patterns — hour × day heatmaps revealing dispatch and demand cycles.
From a laptop and a tunnel to AWS Tokyo.
Earlier life: Streamlit on a personal laptop, exposed via a Cloudflare tunnel. The day the laptop closed, the dashboard died. Migration target: treat the dashboard like the production system it had quietly become.
Numbers, on demand.
Leadership opens the dashboard, picks a date range, gets answers in milliseconds. The monthly reporting forensics ended. The pincode RTO and combo-affinity work below were only feasible because this layer existed first — they piggyback on the same Parquet snapshots and DuckDB engine.