The Problem: Requests Dropped on Shutdown
Every deployment creates a race condition. The moment your process manager sends SIGTERM to the old worker, in-flight requests are at risk. Clients see 502 Bad Gateway errors, partial JSON responses, or abrupt WebSocket disconnections — all because the application exited before it finished serving.
Connection draining is the practice of letting in-flight requests complete before the process terminates. Done correctly, users experience zero dropped requests even during rapid rolling deployments.
The failure modes are predictable:
- Load balancer routes to a terminating backend — the process is shutting down but the LB hasn't deregistered it yet, so new connections keep arriving
- Process exits before response is sent — the app handles the request body but SIGKILL interrupts the write
- Long-lived connections are severed — WebSocket, gRPC streaming, and SSE clients see unexpected disconnects and may not reconnect cleanly
SIGTERM Handling
When an orchestrator wants to stop your process it sends SIGTERM first, then waits a grace period, then sends SIGKILL. Your application must respond to SIGTERM by stopping new connection acceptance while continuing to serve existing ones.
Gunicorn (Python)
Gunicorn has built-in graceful shutdown. When it receives SIGTERM it stops the master from accepting new connections and waits for workers to finish:
# gunicorn.conf.py
graceful_timeout = 30 # seconds to wait for workers to finish
timeout = 60 # hard worker timeout
keepalive = 5 # keep-alive connection timeout
The sequence: SIGTERM → master stops listening → workers drain → workers exit → master exits.
Node.js / Express
Node.js does not automatically drain connections on SIGTERM. You must explicitly stop the server from accepting new connections:
const server = app.listen(8080);
process.on('SIGTERM', () => {
console.log('SIGTERM received — draining connections');
server.close(() => {
// All existing connections have finished
console.log('Server closed cleanly');
process.exit(0);
});
// Force exit after grace period
setTimeout(() => {
console.error('Forcing shutdown after timeout');
process.exit(1);
}, 30_000);
});
Go (net/http)
Go's http.Server has Shutdown() for graceful shutdown:
srv := &http.Server{Addr: ":8080", Handler: router}
go func() {
if err := srv.ListenAndServe(); err != http.ErrServerClosed {
log.Fatalf("ListenAndServe: %v", err)
}
}()
// Wait for SIGTERM
quit := make(chan os.Signal, 1)
signal.Notify(quit, syscall.SIGTERM, syscall.SIGINT)
<-quit
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
if err := srv.Shutdown(ctx); err != nil {
log.Fatalf("Shutdown error: %v", err)
}
Load Balancer Deregistration
Application-level SIGTERM handling is not enough on its own. The load balancer must stop routing new requests to the draining instance before — or at the same time as — the application starts draining. Otherwise the LB continues sending new connections to an instance that just received SIGTERM.
AWS ALB Deregistration Delay
ALB has a built-in deregistration delay (default: 300 seconds). During this window, the target is marked "draining" and the LB:
- Stops routing new requests to the draining target
- Allows in-flight requests to complete
- Closes all connections after the delay expires
# Reduce deregistration delay for faster deployments
aws elbv2 modify-target-group-attributes \
--target-group-arn arn:aws:elasticloadbalancing:... \
--attributes Key=deregistration_delay.timeout_seconds,Value=30
Set the deregistration delay to slightly longer than your application's grace period. If your app drains in 20 seconds, set the LB delay to 25-30 seconds.
Kubernetes terminationGracePeriodSeconds
Kubernetes sends SIGTERM and then waits terminationGracePeriodSeconds before sending SIGKILL. The default is 30 seconds.
spec:
terminationGracePeriodSeconds: 60
containers:
- name: app
lifecycle:
preStop:
exec:
# Sleep to allow kube-proxy to update iptables rules
# before the application stops accepting connections
command: ["/bin/sh", "-c", "sleep 5"]
The preStop sleep is critical. kube-proxy propagates endpoint removal asynchronously — without a brief sleep, new requests can still reach the pod in the 1-5 seconds after SIGTERM arrives.
The recommended sequence:
- Pod enters Terminating state
preStophook runs (sleep 5s) — allows kube-proxy to drain- SIGTERM sent to container
- Application stops accepting new connections and drains existing ones
- If not done within
terminationGracePeriodSeconds, SIGKILL is sent
Long-Lived Connections
Short HTTP/1.1 request-response cycles drain naturally — the in-flight response finishes in milliseconds. Long-lived connections (WebSocket, gRPC streaming, SSE) require explicit drain signaling.
WebSocket Drain
On SIGTERM, send WebSocket close frame with code 1001 (Going Away) before closing the connection:
// Node.js WebSocket server (ws library)
process.on('SIGTERM', () => {
wss.clients.forEach(client => {
if (client.readyState === WebSocket.OPEN) {
// 1001 = Going Away (server shutdown)
client.close(1001, 'Server shutting down');
}
});
// Give clients time to reconnect before hard close
setTimeout(() => process.exit(0), 5000);
});
Clients receiving close code 1001 know to reconnect immediately rather than treating it as an error.
gRPC GOAWAY
gRPC over HTTP/2 uses the GOAWAY frame to signal shutdown. Most gRPC server libraries handle this automatically when you call the graceful stop method:
// Go gRPC server
grpcServer := grpc.NewServer()
go func() {
<-quit
// GracefulStop sends GOAWAY and waits for RPCs to complete
grpcServer.GracefulStop()
}()
# Python gRPC server
server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
# ...
server.start()
def handle_sigterm(*args):
# grace=30 gives RPCs 30 seconds to complete
server.stop(grace=30)
signal.signal(signal.SIGTERM, handle_sigterm)
server.wait_for_termination()
Server-Sent Events (SSE)
SSE connections are long-lived HTTP responses. On shutdown, close the response stream and send a custom event that instructs the client to reconnect:
# Django SSE view
def event_stream(request):
def generate():
try:
for event in event_queue:
yield f'data: {json.dumps(event)}\n\n'
except GeneratorExit:
pass # Client disconnected
finally:
# Send reconnect signal before closing
yield 'event: reconnect\ndata: {}\n\n'
return StreamingHttpResponse(generate(), content_type='text/event-stream')
Testing Your Shutdown Sequence
Graceful shutdown is only trustworthy if you test it under load.
Load Test During Deployment
# Start constant load with hey
hey -z 60s -c 10 http://localhost:8080/api/health &
LOAD_PID=$!
# Trigger a deployment (sends SIGTERM to old process)
sudo systemctl restart gunicorn
# Wait and check results
wait $LOAD_PID
# hey output shows non-2xx responses — any 502s indicate dropped connections
Monitor 502 Rate During Deploys
Set up a Prometheus alert that fires when 502 rate exceeds zero during the deployment window:
# prometheus/alerts.yml
- alert: DroppedConnectionsDuringDeploy
expr: rate(nginx_http_requests_total{status='502'}[1m]) > 0
for: 30s
labels:
severity: warning
annotations:
summary: '502 errors detected — possible connection drain failure'
Chaos Engineering
Use tools like kill -SIGTERM <pid> in a staging environment with real traffic to validate your drain sequence. The acceptance criterion is zero non-2xx responses during a controlled shutdown under sustained load.
A complete drain checklist:
| Layer | Action | Verify |
|---|---|---|
| Load balancer | Deregistration delay configured | No new connections after SIGTERM |
| Application | SIGTERM handler stops accept() | Existing requests complete |
| WebSocket | Close frame 1001 sent | Client reconnects automatically |
| gRPC | GracefulStop() called | In-flight RPCs complete |
| SSE | Stream closed with reconnect event | Client re-establishes SSE |
| Orchestrator | Grace period > app drain time | No premature SIGKILL |