Case Study · Confidential B2B SaaS, North America

Cutting B2B SaaS API p95 Latency 8x with a Production Redis Caching Layer

How we re-architected a hot Python/FastAPI API with a multi-tier Redis caching layer to take p95 latency from 1.8s to 220ms while cutting PostgreSQL load by 70% for a North American B2B SaaS.

  • IndustrySaaS
  • Year2024
  • CountryUSA
  • Duration4 months
Cutting B2B SaaS API p95 Latency 8x with a Production Redis Caching Layer hero screenshot

At-a-glance results

  • 8xp95 latency drop on hot endpoints (1.8s → 220ms)
  • 70%Reduction in PostgreSQL read load
  • $3.4K/moAvoided RDS upgrade — paid for the engagement in 90 days
  • 99.97%Cache layer availability over the first 6 months

The challenge

A fast-growing B2B SaaS was hitting a wall. Their Python/FastAPI API powered dashboards for thousands of seat-based accounts, and the read path had grown into a tangle of N+1 ORM queries, repeated permission checks, and per-request feature-flag lookups. p95 latency on the busiest endpoints had drifted from 380ms at launch to 1.8s, RDS CPU was pinned at 80% during business hours, and the engineering team was already pricing a vertical RDS upgrade that would have added ~$3,400/mo to the bill.

Worse, the slowness was invisible to half the customer base because the dashboard renders progressively — by the time customers complained, the team had already lost an enterprise renewal over "the dashboard feels broken." The team needed a fix in weeks, not a six-month re-platform.

Our solution

We dropped a disciplined, three-tier Redis caching layer in front of the hottest read paths and re-shaped the data model so it could be cached safely. The investment was in cache key design, invalidation contracts, and observability — not in throwing memory at the problem.

Tier 1: per-request memoization inside the FastAPI dependency container, killing duplicate lookups within a single request. Tier 2: a shared Redis 7 cluster (cluster-mode, AWS ElastiCache) holding hot reads — permission sets, feature flags, account metadata, dashboard aggregates — with explicit TTLs and a typed Pydantic envelope so cached payloads are versioned and safe to evolve. Tier 3: an out-of-band Celery worker that pre-warms the most-requested aggregates immediately after writes, so the next user request is already a cache hit.

Every cache key is namespaced by tenant + entity + version, every read records a hit/miss to Datadog, and every write goes through a single invalidation module so a future engineer can't silently bypass it.

  • Three-tier caching: in-process memoization, Redis cluster, and pre-warming Celery workers
  • Typed Pydantic cache envelopes with explicit version field for safe schema evolution
  • Namespaced keys (tenant + entity + version) so deploys never serve mixed-shape data
  • Single invalidation module — every write path funnels through it; no silent bypass
  • Shadow-read rollout with cache-vs-DB diffing on 10% of live traffic before cutover
  • Datadog dashboards for hit rate, p95, eviction rate, and Redis memory headroom
  • PagerDuty alerts on hit-rate drop, eviction spikes, and Redis primary failover
  • k6 load tests reproducing 3x peak traffic, with a written capacity model

How we built it

  1. 01

    Hotspot audit & cacheability map

    We started by instrumenting the existing API with Datadog APM and a custom SQL profiler, then ranked every endpoint by p95 cost and call volume. The top 14 endpoints accounted for 91% of total DB time. For each one we built a cacheability map: what's safe to cache, what's per-tenant, what's per-user, what TTL, and what events must invalidate it.

  2. 02

    Cache contract and key design

    Before writing any Redis code, we authored a one-page cache contract: typed envelopes, a single key-builder function, mandatory TTLs, namespaced versioning so deploys never serve stale shapes, and a no-bypass rule enforced by a small mypy-checked decorator. This was the most important step — it prevented the usual cache-rot that kills these projects in year two.

  3. 03

    Implementation in tight slices

    We shipped one endpoint family per week behind a feature flag, with a shadow-read mode that compared cache and DB results on 10% of traffic for 48 hours before flipping. Rollouts were per-tenant so any anomaly was contained, and each slice came with a runbook for the on-call engineer.

  4. 04

    Observability, load test, handoff

    We added Datadog dashboards for hit rate, eviction rate, p95 by endpoint, and Redis memory headroom, then ran a 2-hour k6 load test at 3x peak traffic to confirm the new ceiling. The client's team got a written runbook, alert thresholds, and a 4-week post-launch retainer for tuning.

Tech stack

  • Python
  • FastAPI
  • Redis 7
  • PostgreSQL
  • Celery
  • Docker
  • AWS ECS
  • Datadog
  • Backend Performance Engineering
  • Python & FastAPI
  • Cloud Solutions
  • DevOps & Observability
We went from re-architecting our database to shipping the next set of customer features in the same quarter. UnlockLive treated cache invalidation like a real engineering discipline, not a hack.
VP of Engineering · B2B SaaS client (name confidential)

Frequently asked questions

How do you decide what's safe to cache in Redis for a multi-tenant SaaS?

We start with a cacheability audit per endpoint: tenant boundary, freshness tolerance, and which events invalidate the result. Per-tenant aggregates with clear write paths are easy wins; cross-tenant joins almost never are. Every cached value gets a typed envelope and a versioned key so future schema changes don't poison the cache.

How do you avoid stale data and the classic Redis cache invalidation problem?

Two rules: every write path goes through a single invalidation module (enforced by a typed decorator so it can't be bypassed), and every cache key includes a schema version. We also run a shadow-read mode in production that diffs cache vs. database results on a slice of traffic before promoting a new cached endpoint.

When does Redis caching actually save money on AWS, and when is it just complexity?

It pays back when you can show a measurable RDS or compute saving — typically when your hot reads are 60%+ of total DB time and the workload has natural locality. For us the breakeven is usually 3-6 weeks of engineering work paying off via a deferred RDS upgrade. We model this before quoting the project.

Do you use Redis Cluster or single-node Redis for SaaS workloads?

We default to managed Redis cluster mode (AWS ElastiCache or equivalent) for production SaaS — it gives you sharding, online failover, and predictable scaling. Single-node Redis is fine for queues or session caches but not for the read-path cache that keeps your dashboard alive.

How long does a Redis caching project like this take?

Typically 8-16 weeks for a mid-sized FastAPI or Django API: 2 weeks of audit and contract design, 6-12 weeks of slice-by-slice rollout under feature flags, and a 2-4 week stabilization window with on-call coverage.

Want a result like this?

Talk to the same team that built Cutting B2B SaaS API p95 Latency 8x with a Production Redis Caching Layer. We’ll scope your project, give you a fixed-price proposal, and show you the closest analog from our portfolio.

Book a strategy call