Connection Draining: Graceful Shutdown Without Dropping Requests

The Problem: Requests Dropped on Shutdown

Every deployment creates a race condition. The moment your process manager sends SIGTERM to the old worker, in-flight requests are at risk. Clients see 502 Bad Gateway errors, partial JSON responses, or abrupt WebSocket disconnections — all because the application exited before it finished serving.

Connection draining is the practice of letting in-flight requests complete before the process terminates. Done correctly, users experience zero dropped requests even during rapid rolling deployments.

The failure modes are predictable:

Load balancer routes to a terminating backend — the process is shutting down but the LB hasn't deregistered it yet, so new connections keep arriving
Process exits before response is sent — the app handles the request body but SIGKILL interrupts the write
Long-lived connections are severed — WebSocket, gRPC streaming, and SSE clients see unexpected disconnects and may not reconnect cleanly

SIGTERM Handling

When an orchestrator wants to stop your process it sends SIGTERM first, then waits a grace period, then sends SIGKILL. Your application must respond to SIGTERM by stopping new connection acceptance while continuing to serve existing ones.

Gunicorn (Python)

Gunicorn has built-in graceful shutdown. When it receives SIGTERM it stops the master from accepting new connections and waits for workers to finish:

# gunicorn.conf.py
graceful_timeout = 30  # seconds to wait for workers to finish
timeout = 60           # hard worker timeout
keepalive = 5          # keep-alive connection timeout

The sequence: SIGTERM → master stops listening → workers drain → workers exit → master exits.

Node.js / Express

Node.js does not automatically drain connections on SIGTERM. You must explicitly stop the server from accepting new connections:

const server = app.listen(8080);

process.on('SIGTERM', () => {
  console.log('SIGTERM received — draining connections');
  server.close(() => {
    // All existing connections have finished
    console.log('Server closed cleanly');
    process.exit(0);
  });

  // Force exit after grace period
  setTimeout(() => {
    console.error('Forcing shutdown after timeout');
    process.exit(1);
  }, 30_000);
});

Go (net/http)

Go's http.Server has Shutdown() for graceful shutdown:

srv := &http.Server{Addr: ":8080", Handler: router}

go func() {
    if err := srv.ListenAndServe(); err != http.ErrServerClosed {
        log.Fatalf("ListenAndServe: %v", err)
    }
}()

// Wait for SIGTERM
quit := make(chan os.Signal, 1)
signal.Notify(quit, syscall.SIGTERM, syscall.SIGINT)
<-quit

ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
if err := srv.Shutdown(ctx); err != nil {
    log.Fatalf("Shutdown error: %v", err)
}

Load Balancer Deregistration

Application-level SIGTERM handling is not enough on its own. The load balancer must stop routing new requests to the draining instance before — or at the same time as — the application starts draining. Otherwise the LB continues sending new connections to an instance that just received SIGTERM.

AWS ALB Deregistration Delay

ALB has a built-in deregistration delay (default: 300 seconds). During this window, the target is marked "draining" and the LB:

Stops routing new requests to the draining target
Allows in-flight requests to complete
Closes all connections after the delay expires

# Reduce deregistration delay for faster deployments
aws elbv2 modify-target-group-attributes \
  --target-group-arn arn:aws:elasticloadbalancing:... \
  --attributes Key=deregistration_delay.timeout_seconds,Value=30

Set the deregistration delay to slightly longer than your application's grace period. If your app drains in 20 seconds, set the LB delay to 25-30 seconds.

Kubernetes terminationGracePeriodSeconds

Kubernetes sends SIGTERM and then waits terminationGracePeriodSeconds before sending SIGKILL. The default is 30 seconds.

spec:
  terminationGracePeriodSeconds: 60
  containers:
  - name: app
    lifecycle:
      preStop:
        exec:
          # Sleep to allow kube-proxy to update iptables rules
          # before the application stops accepting connections
          command: ["/bin/sh", "-c", "sleep 5"]

The preStop sleep is critical. kube-proxy propagates endpoint removal asynchronously — without a brief sleep, new requests can still reach the pod in the 1-5 seconds after SIGTERM arrives.

The recommended sequence:

Pod enters Terminating state
preStop hook runs (sleep 5s) — allows kube-proxy to drain
SIGTERM sent to container
Application stops accepting new connections and drains existing ones
If not done within terminationGracePeriodSeconds, SIGKILL is sent

Long-Lived Connections

Short HTTP/1.1 request-response cycles drain naturally — the in-flight response finishes in milliseconds. Long-lived connections (WebSocket, gRPC streaming, SSE) require explicit drain signaling.

WebSocket Drain

On SIGTERM, send WebSocket close frame with code 1001 (Going Away) before closing the connection:

// Node.js WebSocket server (ws library)
process.on('SIGTERM', () => {
  wss.clients.forEach(client => {
    if (client.readyState === WebSocket.OPEN) {
      // 1001 = Going Away (server shutdown)
      client.close(1001, 'Server shutting down');
    }
  });

  // Give clients time to reconnect before hard close
  setTimeout(() => process.exit(0), 5000);
});

Clients receiving close code 1001 know to reconnect immediately rather than treating it as an error.

gRPC GOAWAY

gRPC over HTTP/2 uses the GOAWAY frame to signal shutdown. Most gRPC server libraries handle this automatically when you call the graceful stop method:

// Go gRPC server
grpcServer := grpc.NewServer()

go func() {
    <-quit
    // GracefulStop sends GOAWAY and waits for RPCs to complete
    grpcServer.GracefulStop()
}()

# Python gRPC server
server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
# ...
server.start()

def handle_sigterm(*args):
    # grace=30 gives RPCs 30 seconds to complete
    server.stop(grace=30)

signal.signal(signal.SIGTERM, handle_sigterm)
server.wait_for_termination()

Server-Sent Events (SSE)

SSE connections are long-lived HTTP responses. On shutdown, close the response stream and send a custom event that instructs the client to reconnect:

# Django SSE view
def event_stream(request):
    def generate():
        try:
            for event in event_queue:
                yield f'data: {json.dumps(event)}\n\n'
        except GeneratorExit:
            pass  # Client disconnected
        finally:
            # Send reconnect signal before closing
            yield 'event: reconnect\ndata: {}\n\n'
    return StreamingHttpResponse(generate(), content_type='text/event-stream')

Testing Your Shutdown Sequence

Graceful shutdown is only trustworthy if you test it under load.

Load Test During Deployment

# Start constant load with hey
hey -z 60s -c 10 http://localhost:8080/api/health &
LOAD_PID=$!

# Trigger a deployment (sends SIGTERM to old process)
sudo systemctl restart gunicorn

# Wait and check results
wait $LOAD_PID
# hey output shows non-2xx responses — any 502s indicate dropped connections

Monitor 502 Rate During Deploys

Set up a Prometheus alert that fires when 502 rate exceeds zero during the deployment window:

# prometheus/alerts.yml
- alert: DroppedConnectionsDuringDeploy
  expr: rate(nginx_http_requests_total{status='502'}[1m]) > 0
  for: 30s
  labels:
    severity: warning
  annotations:
    summary: '502 errors detected — possible connection drain failure'

Chaos Engineering

Use tools like kill -SIGTERM <pid> in a staging environment with real traffic to validate your drain sequence. The acceptance criterion is zero non-2xx responses during a controlled shutdown under sustained load.

A complete drain checklist:

Layer	Action	Verify
Load balancer	Deregistration delay configured	No new connections after SIGTERM
Application	SIGTERM handler stops accept()	Existing requests complete
WebSocket	Close frame 1001 sent	Client reconnects automatically
gRPC	GracefulStop() called	In-flight RPCs complete
SSE	Stream closed with reconnect event	Client re-establishes SSE
Orchestrator	Grace period > app drain time	No premature SIGKILL