Production Infrastructure

Connection Draining: Graceful Shutdown Without Dropping Requests

How to drain in-flight connections during deployments and scaling events — SIGTERM handling, load balancer deregistration delay, WebSocket and gRPC long-lived connection management, and chaos testing your shutdown sequence.

The Problem: Requests Dropped on Shutdown

Every deployment creates a race condition. The moment your process manager sends SIGTERM to the old worker, in-flight requests are at risk. Clients see 502 Bad Gateway errors, partial JSON responses, or abrupt WebSocket disconnections — all because the application exited before it finished serving.

Connection draining is the practice of letting in-flight requests complete before the process terminates. Done correctly, users experience zero dropped requests even during rapid rolling deployments.

The failure modes are predictable:

  • Load balancer routes to a terminating backend — the process is shutting down but the LB hasn't deregistered it yet, so new connections keep arriving
  • Process exits before response is sent — the app handles the request body but SIGKILL interrupts the write
  • Long-lived connections are severed — WebSocket, gRPC streaming, and SSE clients see unexpected disconnects and may not reconnect cleanly

SIGTERM Handling

When an orchestrator wants to stop your process it sends SIGTERM first, then waits a grace period, then sends SIGKILL. Your application must respond to SIGTERM by stopping new connection acceptance while continuing to serve existing ones.

Gunicorn (Python)

Gunicorn has built-in graceful shutdown. When it receives SIGTERM it stops the master from accepting new connections and waits for workers to finish:

# gunicorn.conf.py
graceful_timeout = 30  # seconds to wait for workers to finish
timeout = 60           # hard worker timeout
keepalive = 5          # keep-alive connection timeout

The sequence: SIGTERM → master stops listening → workers drain → workers exit → master exits.

Node.js / Express

Node.js does not automatically drain connections on SIGTERM. You must explicitly stop the server from accepting new connections:

const server = app.listen(8080);

process.on('SIGTERM', () => {
  console.log('SIGTERM received — draining connections');
  server.close(() => {
    // All existing connections have finished
    console.log('Server closed cleanly');
    process.exit(0);
  });

  // Force exit after grace period
  setTimeout(() => {
    console.error('Forcing shutdown after timeout');
    process.exit(1);
  }, 30_000);
});

Go (net/http)

Go's http.Server has Shutdown() for graceful shutdown:

srv := &http.Server{Addr: ":8080", Handler: router}

go func() {
    if err := srv.ListenAndServe(); err != http.ErrServerClosed {
        log.Fatalf("ListenAndServe: %v", err)
    }
}()

// Wait for SIGTERM
quit := make(chan os.Signal, 1)
signal.Notify(quit, syscall.SIGTERM, syscall.SIGINT)
<-quit

ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
if err := srv.Shutdown(ctx); err != nil {
    log.Fatalf("Shutdown error: %v", err)
}

Load Balancer Deregistration

Application-level SIGTERM handling is not enough on its own. The load balancer must stop routing new requests to the draining instance before — or at the same time as — the application starts draining. Otherwise the LB continues sending new connections to an instance that just received SIGTERM.

AWS ALB Deregistration Delay

ALB has a built-in deregistration delay (default: 300 seconds). During this window, the target is marked "draining" and the LB:

  • Stops routing new requests to the draining target
  • Allows in-flight requests to complete
  • Closes all connections after the delay expires
# Reduce deregistration delay for faster deployments
aws elbv2 modify-target-group-attributes \
  --target-group-arn arn:aws:elasticloadbalancing:... \
  --attributes Key=deregistration_delay.timeout_seconds,Value=30

Set the deregistration delay to slightly longer than your application's grace period. If your app drains in 20 seconds, set the LB delay to 25-30 seconds.

Kubernetes terminationGracePeriodSeconds

Kubernetes sends SIGTERM and then waits terminationGracePeriodSeconds before sending SIGKILL. The default is 30 seconds.

spec:
  terminationGracePeriodSeconds: 60
  containers:
  - name: app
    lifecycle:
      preStop:
        exec:
          # Sleep to allow kube-proxy to update iptables rules
          # before the application stops accepting connections
          command: ["/bin/sh", "-c", "sleep 5"]

The preStop sleep is critical. kube-proxy propagates endpoint removal asynchronously — without a brief sleep, new requests can still reach the pod in the 1-5 seconds after SIGTERM arrives.

The recommended sequence:

  • Pod enters Terminating state
  • preStop hook runs (sleep 5s) — allows kube-proxy to drain
  • SIGTERM sent to container
  • Application stops accepting new connections and drains existing ones
  • If not done within terminationGracePeriodSeconds, SIGKILL is sent

Long-Lived Connections

Short HTTP/1.1 request-response cycles drain naturally — the in-flight response finishes in milliseconds. Long-lived connections (WebSocket, gRPC streaming, SSE) require explicit drain signaling.

WebSocket Drain

On SIGTERM, send WebSocket close frame with code 1001 (Going Away) before closing the connection:

// Node.js WebSocket server (ws library)
process.on('SIGTERM', () => {
  wss.clients.forEach(client => {
    if (client.readyState === WebSocket.OPEN) {
      // 1001 = Going Away (server shutdown)
      client.close(1001, 'Server shutting down');
    }
  });

  // Give clients time to reconnect before hard close
  setTimeout(() => process.exit(0), 5000);
});

Clients receiving close code 1001 know to reconnect immediately rather than treating it as an error.

gRPC GOAWAY

gRPC over HTTP/2 uses the GOAWAY frame to signal shutdown. Most gRPC server libraries handle this automatically when you call the graceful stop method:

// Go gRPC server
grpcServer := grpc.NewServer()

go func() {
    <-quit
    // GracefulStop sends GOAWAY and waits for RPCs to complete
    grpcServer.GracefulStop()
}()
# Python gRPC server
server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
# ...
server.start()

def handle_sigterm(*args):
    # grace=30 gives RPCs 30 seconds to complete
    server.stop(grace=30)

signal.signal(signal.SIGTERM, handle_sigterm)
server.wait_for_termination()

Server-Sent Events (SSE)

SSE connections are long-lived HTTP responses. On shutdown, close the response stream and send a custom event that instructs the client to reconnect:

# Django SSE view
def event_stream(request):
    def generate():
        try:
            for event in event_queue:
                yield f'data: {json.dumps(event)}\n\n'
        except GeneratorExit:
            pass  # Client disconnected
        finally:
            # Send reconnect signal before closing
            yield 'event: reconnect\ndata: {}\n\n'
    return StreamingHttpResponse(generate(), content_type='text/event-stream')

Testing Your Shutdown Sequence

Graceful shutdown is only trustworthy if you test it under load.

Load Test During Deployment

# Start constant load with hey
hey -z 60s -c 10 http://localhost:8080/api/health &
LOAD_PID=$!

# Trigger a deployment (sends SIGTERM to old process)
sudo systemctl restart gunicorn

# Wait and check results
wait $LOAD_PID
# hey output shows non-2xx responses — any 502s indicate dropped connections

Monitor 502 Rate During Deploys

Set up a Prometheus alert that fires when 502 rate exceeds zero during the deployment window:

# prometheus/alerts.yml
- alert: DroppedConnectionsDuringDeploy
  expr: rate(nginx_http_requests_total{status='502'}[1m]) > 0
  for: 30s
  labels:
    severity: warning
  annotations:
    summary: '502 errors detected — possible connection drain failure'

Chaos Engineering

Use tools like kill -SIGTERM <pid> in a staging environment with real traffic to validate your drain sequence. The acceptance criterion is zero non-2xx responses during a controlled shutdown under sustained load.

A complete drain checklist:

LayerActionVerify
Load balancerDeregistration delay configuredNo new connections after SIGTERM
ApplicationSIGTERM handler stops accept()Existing requests complete
WebSocketClose frame 1001 sentClient reconnects automatically
gRPCGracefulStop() calledIn-flight RPCs complete
SSEStream closed with reconnect eventClient re-establishes SSE
OrchestratorGrace period > app drain timeNo premature SIGKILL

Protocolos relacionados

Termos do glossário relacionados

Mais em Production Infrastructure