API Response Time Optimization: From Database to Wire

Measuring Latency First

Never optimize without measurement. Before touching any code, instrument your API to understand where time is actually being spent.

Server-Timing Header

The Server-Timing response header exposes server-side timing to the browser DevTools waterfall:

import time
from django.http import JsonResponse

def get_feed(request):
    t_db_start = time.perf_counter()
    posts = list(Post.objects.select_related('author').order_by('-created')[:20])
    t_db = (time.perf_counter() - t_db_start) * 1000

    t_serialize_start = time.perf_counter()
    data = [{'id': p.id, 'title': p.title, 'author': p.author.name} for p in posts]
    t_serialize = (time.perf_counter() - t_serialize_start) * 1000

    response = JsonResponse({'posts': data})
    response['Server-Timing'] = (
        f'db;dur={t_db:.1f};desc="Database", '
        f'serialize;dur={t_serialize:.1f};desc="Serialization"'
    )
    return response

Chrome DevTools → Network → select request → Timing tab shows these breakdowns.

Percentile Analysis

Average response times hide the worst user experiences. Always measure at percentiles:

Metric	Meaning
p50	Half of requests are faster than this
p95	95% of requests are faster than this
p99	99% of requests are faster than this
p999	99.9% of requests are faster than this

A p50 of 50ms with a p99 of 5,000ms means 1% of users wait 5 seconds. That is unacceptable for interactive APIs.

Database Layer

The database is the bottleneck in most API latency problems.

Eliminate N+1 Queries

An N+1 query fetches a list of N objects, then makes one additional query per object to load a related resource:

# BAD: N+1 — 1 query for posts + 1 query per post for author
posts = Post.objects.all()[:20]  # 1 query
for post in posts:
    print(post.author.name)       # N queries

# GOOD: 2 queries total using select_related
posts = Post.objects.select_related('author').all()[:20]

# GOOD: 2 queries total using prefetch_related (for M2M)
posts = Post.objects.prefetch_related('tags').all()[:20]

Use a query logging middleware in development to catch N+1 patterns before they reach production. Django's django-debug-toolbar and nplusone library both detect N+1 queries automatically.

Query Optimization

# Only fetch columns you need
Post.objects.values('id', 'title', 'created').order_by('-created')[:20]

# Use .exists() instead of .count() for presence checks
if Post.objects.filter(slug=slug).exists():  # fast
    ...

# Add indexes for commonly filtered columns
class Post(models.Model):
    created = models.DateTimeField(db_index=True)
    slug = models.SlugField(unique=True)  # unique implies index

Connection Pooling

Every uncached database connection requires a TCP handshake and authentication round trip. Use PgBouncer or your framework's built-in pooling:

# PgBouncer configuration
[databases]
mydb = host=postgres port=5432 dbname=mydb

[pgbouncer]
pool_mode = transaction
max_client_conn = 100
default_pool_size = 25

Application Layer

Serialization Performance

JSON serialization is often overlooked but can be significant for large payloads:

import json
import orjson   # 3-5x faster than stdlib json
import msgpack  # binary format, 30-50% smaller than JSON

# orjson returns bytes, not str — return directly in response
def get_data(request):
    data = {'users': [...]}
    return HttpResponse(
        orjson.dumps(data),
        content_type='application/json',
    )

For internal service-to-service APIs where you control both ends, MessagePack provides binary serialization that is 30-50% smaller and faster to parse than JSON.

Async Processing

Move work that does not need to be in the critical path to a background task queue:

# BAD: Send email synchronously in request handler (adds 200-500ms)
def create_user(request):
    user = User.objects.create(**request.POST)
    send_welcome_email(user)  # blocks response
    return JsonResponse({'id': user.id}, status=201)

# GOOD: Queue email, return immediately
from django_tasks import task

def create_user(request):
    user = User.objects.create(**request.POST)
    send_welcome_email.defer(user.id)  # async
    return JsonResponse({'id': user.id}, status=201)

Transport Layer

Response Compression

Compression reduces bytes on the wire, improving throughput for large responses:

# Nginx: enable gzip and brotli compression
gzip on;
gzip_types application/json text/plain text/css;
gzip_min_length 1000;  # Don't compress small responses
gzip_comp_level 6;     # Balance speed vs ratio

# Brotli (better compression, modern browsers)
brotli on;
brotli_comp_level 4;
brotli_types application/json text/plain;

For JSON APIs, compression typically achieves 60-80% size reduction, meaning a 100KB payload becomes 20-40KB on the wire.

HTTP/2 Multiplexing

HTTP/2 sends multiple requests over a single TCP connection simultaneously. This eliminates head-of-line blocking and the 6-connection-per-domain limit of HTTP/1.1:

# Nginx: enable HTTP/2
server {
    listen 443 ssl http2;
    ssl_protocols TLSv1.2 TLSv1.3;
    # HTTP/2 is automatically negotiated via ALPN
}

Caching Layers

The fastest database query is the one you never make. Add caching at multiple layers:

from django.core.cache import cache

def get_user_profile(user_id: int):
    cache_key = f'user_profile:{user_id}'
    cached = cache.get(cache_key)
    if cached is not None:
        return cached

    user = User.objects.select_related('profile').get(pk=user_id)
    data = {'id': user.id, 'name': user.name, 'bio': user.profile.bio}
    cache.set(cache_key, data, timeout=300)  # 5-minute TTL
    return data

Cache warming pre-populates the cache before it is needed — useful for predictable high-traffic content (home page, featured items).

Key Takeaways

Measure with Server-Timing and p99 percentiles before optimizing anything
Eliminate N+1 queries — use select_related/prefetch_related or query analyzers
Replace json with orjson for a free 3-5x serialization speedup
Move non-critical work (emails, webhooks, analytics) to background tasks
Enable gzip/brotli compression in Nginx — reduces JSON payloads by 60-80%