ML Inference Pipeline
The problem
A production ML model powering fraud detection was serving predictions through a brittle, monolithic inference service. As request volume grew, p99 latency ballooned past 4 seconds. Deploys took a full afternoon and required manual rollbacks on roughly 30% of releases.
What we built
We rebuilt the serving layer as a purpose-built inference service with model versioning, canary rollouts, and structured observability. Introduced async pre-processing to eliminate the main bottleneck. Rewrote the deploy pipeline with automated rollback triggers tied to latency SLOs.
Outcome
p99 latency dropped from 4.2s to under 900ms. Deploys went from afternoon events to 12-minute automated runs. The ML team now owns and maintains the service independently.