Measuring Latency First
Never optimize without measurement. Before touching any code, instrument your API to understand where time is actually being spent.
Server-Timing Header
The Server-Timing response header exposes server-side timing to the browser DevTools waterfall:
import time
from django.http import JsonResponse
def get_feed(request):
t_db_start = time.perf_counter()
posts = list(Post.objects.select_related('author').order_by('-created')[:20])
t_db = (time.perf_counter() - t_db_start) * 1000
t_serialize_start = time.perf_counter()
data = [{'id': p.id, 'title': p.title, 'author': p.author.name} for p in posts]
t_serialize = (time.perf_counter() - t_serialize_start) * 1000
response = JsonResponse({'posts': data})
response['Server-Timing'] = (
f'db;dur={t_db:.1f};desc="Database", '
f'serialize;dur={t_serialize:.1f};desc="Serialization"'
)
return response
Chrome DevTools → Network → select request → Timing tab shows these breakdowns.
Percentile Analysis
Average response times hide the worst user experiences. Always measure at percentiles:
| Metric | Meaning |
|---|---|
| p50 | Half of requests are faster than this |
| p95 | 95% of requests are faster than this |
| p99 | 99% of requests are faster than this |
| p999 | 99.9% of requests are faster than this |
A p50 of 50ms with a p99 of 5,000ms means 1% of users wait 5 seconds. That is unacceptable for interactive APIs.
Database Layer
The database is the bottleneck in most API latency problems.
Eliminate N+1 Queries
An N+1 query fetches a list of N objects, then makes one additional query per object to load a related resource:
# BAD: N+1 — 1 query for posts + 1 query per post for author
posts = Post.objects.all()[:20] # 1 query
for post in posts:
print(post.author.name) # N queries
# GOOD: 2 queries total using select_related
posts = Post.objects.select_related('author').all()[:20]
# GOOD: 2 queries total using prefetch_related (for M2M)
posts = Post.objects.prefetch_related('tags').all()[:20]
Use a query logging middleware in development to catch N+1 patterns before they reach production. Django's django-debug-toolbar and nplusone library both detect N+1 queries automatically.
Query Optimization
# Only fetch columns you need
Post.objects.values('id', 'title', 'created').order_by('-created')[:20]
# Use .exists() instead of .count() for presence checks
if Post.objects.filter(slug=slug).exists(): # fast
...
# Add indexes for commonly filtered columns
class Post(models.Model):
created = models.DateTimeField(db_index=True)
slug = models.SlugField(unique=True) # unique implies index
Connection Pooling
Every uncached database connection requires a TCP handshake and authentication round trip. Use PgBouncer or your framework's built-in pooling:
# PgBouncer configuration
[databases]
mydb = host=postgres port=5432 dbname=mydb
[pgbouncer]
pool_mode = transaction
max_client_conn = 100
default_pool_size = 25
Application Layer
Serialization Performance
JSON serialization is often overlooked but can be significant for large payloads:
import json
import orjson # 3-5x faster than stdlib json
import msgpack # binary format, 30-50% smaller than JSON
# orjson returns bytes, not str — return directly in response
def get_data(request):
data = {'users': [...]}
return HttpResponse(
orjson.dumps(data),
content_type='application/json',
)
For internal service-to-service APIs where you control both ends, MessagePack provides binary serialization that is 30-50% smaller and faster to parse than JSON.
Async Processing
Move work that does not need to be in the critical path to a background task queue:
# BAD: Send email synchronously in request handler (adds 200-500ms)
def create_user(request):
user = User.objects.create(**request.POST)
send_welcome_email(user) # blocks response
return JsonResponse({'id': user.id}, status=201)
# GOOD: Queue email, return immediately
from django_tasks import task
def create_user(request):
user = User.objects.create(**request.POST)
send_welcome_email.defer(user.id) # async
return JsonResponse({'id': user.id}, status=201)
Transport Layer
Response Compression
Compression reduces bytes on the wire, improving throughput for large responses:
# Nginx: enable gzip and brotli compression
gzip on;
gzip_types application/json text/plain text/css;
gzip_min_length 1000; # Don't compress small responses
gzip_comp_level 6; # Balance speed vs ratio
# Brotli (better compression, modern browsers)
brotli on;
brotli_comp_level 4;
brotli_types application/json text/plain;
For JSON APIs, compression typically achieves 60-80% size reduction, meaning a 100KB payload becomes 20-40KB on the wire.
HTTP/2 Multiplexing
HTTP/2 sends multiple requests over a single TCP connection simultaneously. This eliminates head-of-line blocking and the 6-connection-per-domain limit of HTTP/1.1:
# Nginx: enable HTTP/2
server {
listen 443 ssl http2;
ssl_protocols TLSv1.2 TLSv1.3;
# HTTP/2 is automatically negotiated via ALPN
}
Caching Layers
The fastest database query is the one you never make. Add caching at multiple layers:
from django.core.cache import cache
def get_user_profile(user_id: int):
cache_key = f'user_profile:{user_id}'
cached = cache.get(cache_key)
if cached is not None:
return cached
user = User.objects.select_related('profile').get(pk=user_id)
data = {'id': user.id, 'name': user.name, 'bio': user.profile.bio}
cache.set(cache_key, data, timeout=300) # 5-minute TTL
return data
Cache warming pre-populates the cache before it is needed — useful for predictable high-traffic content (home page, featured items).
Key Takeaways
- Measure with
Server-Timingand p99 percentiles before optimizing anything - Eliminate N+1 queries — use
select_related/prefetch_relatedor query analyzers - Replace
jsonwithorjsonfor a free 3-5x serialization speedup - Move non-critical work (emails, webhooks, analytics) to background tasks
- Enable gzip/brotli compression in Nginx — reduces JSON payloads by 60-80%