Case Study · Confidential B2B SaaS, North America

Cutting B2B SaaS API p95 Latency 8x with a Production Redis Caching Layer

How we re-architected a hot Python/FastAPI API with a multi-tier Redis caching layer to take p95 latency from 1.8s to 220ms while cutting PostgreSQL load by 70% for a North American B2B SaaS.

IndustrySaaS
Year2024
CountryUSA
Duration4 months

Talk to the team that built this

Cutting B2B SaaS API p95 Latency 8x with a Production Redis Caching Layer hero screenshot

At-a-glance results

8xp95 latency drop on hot endpoints (1.8s → 220ms)
70%Reduction in PostgreSQL read load
$3.4K/moAvoided RDS upgrade — paid for the engagement in 90 days
99.97%Cache layer availability over the first 6 months

The challenge

A fast-growing B2B SaaS was hitting a wall. Their Python/FastAPI API powered dashboards for thousands of seat-based accounts, and the read path had grown into a tangle of N+1 ORM queries, repeated permission checks, and per-request feature-flag lookups. p95 latency on the busiest endpoints had drifted from 380ms at launch to 1.8s, RDS CPU was pinned at 80% during business hours, and the engineering team was already pricing a vertical RDS upgrade that would have added ~$3,400/mo to the bill.

Worse, the slowness was invisible to half the customer base because the dashboard renders progressively — by the time customers complained, the team had already lost an enterprise renewal over "the dashboard feels broken." The team needed a fix in weeks, not a six-month re-platform.

Our solution

We dropped a disciplined, three-tier Redis caching layer in front of the hottest read paths and re-shaped the data model so it could be cached safely. The investment was in cache key design, invalidation contracts, and observability — not in throwing memory at the problem.

Tier 1: per-request memoization inside the FastAPI dependency container, killing duplicate lookups within a single request. Tier 2: a shared Redis 7 cluster (cluster-mode, AWS ElastiCache) holding hot reads — permission sets, feature flags, account metadata, dashboard aggregates — with explicit TTLs and a typed Pydantic envelope so cached payloads are versioned and safe to evolve. Tier 3: an out-of-band Celery worker that pre-warms the most-requested aggregates immediately after writes, so the next user request is already a cache hit.

Every cache key is namespaced by tenant + entity + version, every read records a hit/miss to Datadog, and every write goes through a single invalidation module so a future engineer can't silently bypass it.

Three-tier caching: in-process memoization, Redis cluster, and pre-warming Celery workers
Typed Pydantic cache envelopes with explicit version field for safe schema evolution
Namespaced keys (tenant + entity + version) so deploys never serve mixed-shape data
Single invalidation module — every write path funnels through it; no silent bypass
Shadow-read rollout with cache-vs-DB diffing on 10% of live traffic before cutover
Datadog dashboards for hit rate, p95, eviction rate, and Redis memory headroom
PagerDuty alerts on hit-rate drop, eviction spikes, and Redis primary failover
k6 load tests reproducing 3x peak traffic, with a written capacity model

How we built it

01
Hotspot audit & cacheability map
We started by instrumenting the existing API with Datadog APM and a custom SQL profiler, then ranked every endpoint by p95 cost and call volume. The top 14 endpoints accounted for 91% of total DB time. For each one we built a cacheability map: what's safe to cache, what's per-tenant, what's per-user, what TTL, and what events must invalidate it.
02
Cache contract and key design
Before writing any Redis code, we authored a one-page cache contract: typed envelopes, a single key-builder function, mandatory TTLs, namespaced versioning so deploys never serve stale shapes, and a no-bypass rule enforced by a small mypy-checked decorator. This was the most important step — it prevented the usual cache-rot that kills these projects in year two.
03
Implementation in tight slices
We shipped one endpoint family per week behind a feature flag, with a shadow-read mode that compared cache and DB results on 10% of traffic for 48 hours before flipping. Rollouts were per-tenant so any anomaly was contained, and each slice came with a runbook for the on-call engineer.
04
Observability, load test, handoff
We added Datadog dashboards for hit rate, eviction rate, p95 by endpoint, and Redis memory headroom, then ran a 2-hour k6 load test at 3x peak traffic to confirm the new ceiling. The client's team got a written runbook, alert thresholds, and a 4-week post-launch retainer for tuning.

Tech stack

Python
FastAPI
Redis 7
PostgreSQL
Celery
Docker
AWS ECS
Datadog

Backend Performance Engineering
Python & FastAPI
Cloud Solutions
DevOps & Observability

Inside Cutting B2B SaaS API p95 Latency 8x with a Production Redis Caching Layer

FastAPI cache.py with typed Redis decorator and versioned cache keys

Redis cluster health dashboard — hit rate, evictions, memory headroom

Three-tier caching architecture diagram: FastAPI, Redis cluster, PostgreSQL

“We went from re-architecting our database to shipping the next set of customer features in the same quarter. UnlockLive treated cache invalidation like a real engineering discipline, not a hack.”

VP of Engineering · B2B SaaS client (name confidential)

Frequently asked questions

How do you decide what's safe to cache in Redis for a multi-tenant SaaS?

We start with a cacheability audit per endpoint: tenant boundary, freshness tolerance, and which events invalidate the result. Per-tenant aggregates with clear write paths are easy wins; cross-tenant joins almost never are. Every cached value gets a typed envelope and a versioned key so future schema changes don't poison the cache.

How do you avoid stale data and the classic Redis cache invalidation problem?

Two rules: every write path goes through a single invalidation module (enforced by a typed decorator so it can't be bypassed), and every cache key includes a schema version. We also run a shadow-read mode in production that diffs cache vs. database results on a slice of traffic before promoting a new cached endpoint.

When does Redis caching actually save money on AWS, and when is it just complexity?

It pays back when you can show a measurable RDS or compute saving — typically when your hot reads are 60%+ of total DB time and the workload has natural locality. For us the breakeven is usually 3-6 weeks of engineering work paying off via a deferred RDS upgrade. We model this before quoting the project.

Do you use Redis Cluster or single-node Redis for SaaS workloads?

We default to managed Redis cluster mode (AWS ElastiCache or equivalent) for production SaaS — it gives you sharding, online failover, and predictable scaling. Single-node Redis is fine for queues or session caches but not for the read-path cache that keeps your dashboard alive.

How long does a Redis caching project like this take?

Typically 8-16 weeks for a mid-sized FastAPI or Django API: 2 weeks of audit and contract design, 6-12 weeks of slice-by-slice rollout under feature flags, and a 2-4 week stabilization window with on-call coverage.

Want a result like this?

Talk to the same team that built Cutting B2B SaaS API p95 Latency 8x with a Production Redis Caching Layer. We’ll scope your project, give you a fixed-price proposal, and show you the closest analog from our portfolio.

Book a strategy call

Cutting B2B SaaS API p95 Latency 8x with a Production Redis Caching Layer

At-a-glance results

The challenge

Our solution

How we built it

Hotspot audit & cacheability map

Cache contract and key design

Implementation in tight slices

Observability, load test, handoff

Tech stack

Inside Cutting B2B SaaS API p95 Latency 8x with a Production Redis Caching Layer

Frequently asked questions

Want a result like this?

Quick Links

Services

Locations

Cutting B2B SaaS API p95 Latency 8x with a Production Redis Caching Layer

At-a-glance results

The challenge

Our solution

How we built it

Hotspot audit & cacheability map

Cache contract and key design

Implementation in tight slices

Observability, load test, handoff

Tech stack

Inside Cutting B2B SaaS API p95 Latency 8x with a Production Redis Caching Layer

Frequently asked questions

Related case studies

10x Ingest Throughput on a PostgreSQL Analytics Pipeline Using UNLOGGED Tables

Multi-Currency Payments & Subscriptions on FastAPI + Airwallex for a Global B2B SaaS

Branify: AI-Enhanced Customer Service Platform

Want a result like this?

Quick Links

Services

Locations