API Gateway Logging and Analytics

Why Log at the Gateway?

The API gateway is the single point through which all API traffic flows. This makes it the ideal — and often the only — place to collect a complete picture of API usage. Gateway logs capture information that backends cannot easily provide:

Total request volume (including requests rejected before reaching backends)
Latency from the client's perspective (including time spent in the gateway itself, not just upstream service time)
Rate limit events (requests that were rejected, never logged by backends)
Error attribution (which client, API key, or IP is generating the most errors)

Backends should still log their own requests for service-specific debugging, but gateway logs provide the cross-cutting, client-centric view.

Access Logging

Access logs record one entry per request with structured fields. Use JSON format for machine readability:

{
  "timestamp": "2024-02-27T10:23:45.123Z",
  "request_id": "04af84e7-1307-4f68-b5b3-6b58f6ab11f9",
  "method": "POST",
  "path": "/api/v1/orders",
  "query_string": "",
  "status": 201,
  "response_bytes": 1234,
  "latency_ms": 87,
  "upstream_latency_ms": 72,
  "gateway_latency_ms": 15,
  "client_ip": "203.0.113.45",
  "consumer_id": "api-key-xyz",
  "service": "orders-service",
  "route": "orders-route",
  "user_agent": "MyApp/2.1 (Android 14)",
  "upstream_uri": "http://orders.internal:8080/orders",
  "upstream_status": 201,
  "retry_count": 0
}

Essential Fields

Field	Purpose
`request_id`	Correlate gateway log with upstream service logs
`latency_ms`	End-to-end latency (client perspective)
`upstream_latency_ms`	Time waiting for upstream response
`gateway_latency_ms`	Time spent in gateway processing
`consumer_id`	API key or user ID — enables per-client analytics
`retry_count`	Detect which requests required retries
`upstream_status`	Distinguish gateway errors from upstream errors

Kong Access Log Configuration

# Kong file-log plugin (structured JSON)
plugins:
  - name: file-log
    config:
      path: /var/log/kong/access.log
      reopen: true

# Kong http-log plugin (send to log aggregator)
  - name: http-log
    config:
      http_endpoint: http://logstash.internal:5044
      method: POST
      timeout: 1000
      keepalive: 60000
      queue_size: 1000
      flush_timeout: 2

Envoy Access Log

# Envoy JSON access log
access_log:
  - name: envoy.access_loggers.file
    typed_config:
      "@type": type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
      path: /var/log/envoy/access.log
      json_format:
        timestamp: "%START_TIME%"
        method: "%REQ(:METHOD)%"
        path: "%REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%"
        status: "%RESPONSE_CODE%"
        latency_ms: "%DURATION%"
        upstream_latency_ms: "%RESP(X-ENVOY-UPSTREAM-SERVICE-TIME)%"
        client_ip: "%DOWNSTREAM_REMOTE_ADDRESS_WITHOUT_PORT%"
        request_id: "%REQ(X-REQUEST-ID)%"
        upstream_cluster: "%UPSTREAM_CLUSTER%"
        bytes_received: "%BYTES_RECEIVED%"
        bytes_sent: "%BYTES_SENT%"

Request and Response Body Logging

Body logging is powerful for debugging but dangerous for privacy and storage:

When to Log Bodies

Debug mode only: Enable body logging temporarily for a specific client or route when investigating an incident, not permanently
Error responses only: Log request bodies only when the status is 4xx or 5xx
Sampling: Log bodies for 1% of successful requests for baseline observation

PII Masking

Personally identifiable information (PII) must be masked before logging:

import re

PII_PATTERNS = [
    (r'"password":\s*"[^"]+"', '"password": "***"'),
    (r'"card_number":\s*"(\d{4})\d+(\d{4})"', r'"card_number": "\1****\2"'),
    (r'"ssn":\s*"\d{3}-\d{2}-(\d{4})"', r'"ssn": "***-**-\1"'),
    (r'Authorization: Bearer [A-Za-z0-9._-]+', 'Authorization: Bearer ***'),
]

def mask_pii(log_entry: str) -> str:
    for pattern, replacement in PII_PATTERNS:
        log_entry = re.sub(pattern, replacement, log_entry)
    return log_entry

Size Limits

Cap body logging at a maximum size (e.g., first 4KB) to prevent large file uploads or data dumps from flooding log storage:

# Kong request-size-limiting + partial body logging
plugins:
  - name: request-size-limiting
    config:
      allowed_payload_size: 8  # MB

API Analytics

From gateway access logs, derive actionable analytics:

Key Metrics

Metric	Query
Request volume	`COUNT(*) GROUP BY path, method`
Error rate	`COUNT(status >= 500) / COUNT(*) GROUP BY path`
Latency percentiles	`PERCENTILE(latency_ms, 50, 95, 99) GROUP BY path`
Top consumers	`COUNT(*) GROUP BY consumer_id ORDER BY COUNT DESC LIMIT 10`
Geographic distribution	`COUNT(*) GROUP BY geoip(client_ip).country`

Grafana Dashboard

# Prometheus queries for gateway dashboard panels

# Request rate by status class
rate(gateway_requests_total[5m])

# 99th percentile latency
histogram_quantile(0.99, rate(gateway_request_duration_seconds_bucket[5m]))

# Error rate
rate(gateway_requests_total{status=~"5.."}[5m])
/ rate(gateway_requests_total[5m])

# Top 10 paths by volume
topk(10, sum by(path) (rate(gateway_requests_total[5m])))

Usage Metering for Billing

If you charge for API usage, the gateway is the authoritative source for metering:

# Usage metering pipeline
# 1. Gateway logs each request with consumer_id and route
# 2. Log aggregator (Kafka/Kinesis) receives structured logs
# 3. Stream processor counts requests per consumer per billing period
# 4. Billing service reads usage and generates invoices

# Kafka consumer: count API calls per consumer
from collections import defaultdict

usage_counters: dict[str, int] = defaultdict(int)

for log_entry in kafka_consumer:
    consumer_id = log_entry['consumer_id']
    if log_entry['status'] < 500:  # only count successful requests
        usage_counters[consumer_id] += 1

# Flush to billing DB every minute
flush_to_billing_db(usage_counters, period=current_billing_period())

Quota Enforcement Loop

Request → Gateway → check quota (Redis) → allow/deny
                  → log request        → Kafka
                                       → Stream Processor
                                       → Billing DB (usage counters)
                                       → Quota Service → update Redis quota

Tools and Integrations

Tool	Use Case
Kong Analytics (Konnect)	Built-in dashboard for Kong users
AWS API Gateway metrics	CloudWatch integration, per-stage metrics
Envoy access log service	gRPC streaming to Fluentd/Logstash
Grafana + Loki	Log aggregation and query for self-hosted gateways
Elastic (ELK) Stack	Full-text search and aggregation over JSON logs
Datadog APM	Distributed tracing + gateway metrics integration

Summary

Gateway access logs in JSON format are the foundation for API analytics — record request ID, latency breakdown, consumer ID, and upstream status on every request. Enable body logging sparingly and mask PII before writing to logs. Build analytics dashboards from Prometheus metrics or aggregated log queries to track request volume, error rates, and latency percentiles. Use the gateway as the authoritative source for usage metering in billing systems.