Why Log at the Gateway?
The API gateway is the single point through which all API traffic flows. This makes it the ideal — and often the only — place to collect a complete picture of API usage. Gateway logs capture information that backends cannot easily provide:
- Total request volume (including requests rejected before reaching backends)
- Latency from the client's perspective (including time spent in the gateway itself, not just upstream service time)
- Rate limit events (requests that were rejected, never logged by backends)
- Error attribution (which client, API key, or IP is generating the most errors)
Backends should still log their own requests for service-specific debugging, but gateway logs provide the cross-cutting, client-centric view.
Access Logging
Access logs record one entry per request with structured fields. Use JSON format for machine readability:
{
"timestamp": "2024-02-27T10:23:45.123Z",
"request_id": "04af84e7-1307-4f68-b5b3-6b58f6ab11f9",
"method": "POST",
"path": "/api/v1/orders",
"query_string": "",
"status": 201,
"response_bytes": 1234,
"latency_ms": 87,
"upstream_latency_ms": 72,
"gateway_latency_ms": 15,
"client_ip": "203.0.113.45",
"consumer_id": "api-key-xyz",
"service": "orders-service",
"route": "orders-route",
"user_agent": "MyApp/2.1 (Android 14)",
"upstream_uri": "http://orders.internal:8080/orders",
"upstream_status": 201,
"retry_count": 0
}
Essential Fields
| Field | Purpose |
|---|---|
| `request_id` | Correlate gateway log with upstream service logs |
| `latency_ms` | End-to-end latency (client perspective) |
| `upstream_latency_ms` | Time waiting for upstream response |
| `gateway_latency_ms` | Time spent in gateway processing |
| `consumer_id` | API key or user ID — enables per-client analytics |
| `retry_count` | Detect which requests required retries |
| `upstream_status` | Distinguish gateway errors from upstream errors |
Kong Access Log Configuration
# Kong file-log plugin (structured JSON)
plugins:
- name: file-log
config:
path: /var/log/kong/access.log
reopen: true
# Kong http-log plugin (send to log aggregator)
- name: http-log
config:
http_endpoint: http://logstash.internal:5044
method: POST
timeout: 1000
keepalive: 60000
queue_size: 1000
flush_timeout: 2
Envoy Access Log
# Envoy JSON access log
access_log:
- name: envoy.access_loggers.file
typed_config:
"@type": type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
path: /var/log/envoy/access.log
json_format:
timestamp: "%START_TIME%"
method: "%REQ(:METHOD)%"
path: "%REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%"
status: "%RESPONSE_CODE%"
latency_ms: "%DURATION%"
upstream_latency_ms: "%RESP(X-ENVOY-UPSTREAM-SERVICE-TIME)%"
client_ip: "%DOWNSTREAM_REMOTE_ADDRESS_WITHOUT_PORT%"
request_id: "%REQ(X-REQUEST-ID)%"
upstream_cluster: "%UPSTREAM_CLUSTER%"
bytes_received: "%BYTES_RECEIVED%"
bytes_sent: "%BYTES_SENT%"
Request and Response Body Logging
Body logging is powerful for debugging but dangerous for privacy and storage:
When to Log Bodies
- Debug mode only: Enable body logging temporarily for a specific client or route when investigating an incident, not permanently
- Error responses only: Log request bodies only when the status is 4xx or 5xx
- Sampling: Log bodies for 1% of successful requests for baseline observation
PII Masking
Personally identifiable information (PII) must be masked before logging:
import re
PII_PATTERNS = [
(r'"password":\s*"[^"]+"', '"password": "***"'),
(r'"card_number":\s*"(\d{4})\d+(\d{4})"', r'"card_number": "\1****\2"'),
(r'"ssn":\s*"\d{3}-\d{2}-(\d{4})"', r'"ssn": "***-**-\1"'),
(r'Authorization: Bearer [A-Za-z0-9._-]+', 'Authorization: Bearer ***'),
]
def mask_pii(log_entry: str) -> str:
for pattern, replacement in PII_PATTERNS:
log_entry = re.sub(pattern, replacement, log_entry)
return log_entry
Size Limits
Cap body logging at a maximum size (e.g., first 4KB) to prevent large file uploads or data dumps from flooding log storage:
# Kong request-size-limiting + partial body logging
plugins:
- name: request-size-limiting
config:
allowed_payload_size: 8 # MB
API Analytics
From gateway access logs, derive actionable analytics:
Key Metrics
| Metric | Query |
|---|---|
| Request volume | `COUNT(*) GROUP BY path, method` |
| Error rate | `COUNT(status >= 500) / COUNT(*) GROUP BY path` |
| Latency percentiles | `PERCENTILE(latency_ms, 50, 95, 99) GROUP BY path` |
| Top consumers | `COUNT(*) GROUP BY consumer_id ORDER BY COUNT DESC LIMIT 10` |
| Geographic distribution | `COUNT(*) GROUP BY geoip(client_ip).country` |
Grafana Dashboard
# Prometheus queries for gateway dashboard panels
# Request rate by status class
rate(gateway_requests_total[5m])
# 99th percentile latency
histogram_quantile(0.99, rate(gateway_request_duration_seconds_bucket[5m]))
# Error rate
rate(gateway_requests_total{status=~"5.."}[5m])
/ rate(gateway_requests_total[5m])
# Top 10 paths by volume
topk(10, sum by(path) (rate(gateway_requests_total[5m])))
Usage Metering for Billing
If you charge for API usage, the gateway is the authoritative source for metering:
# Usage metering pipeline
# 1. Gateway logs each request with consumer_id and route
# 2. Log aggregator (Kafka/Kinesis) receives structured logs
# 3. Stream processor counts requests per consumer per billing period
# 4. Billing service reads usage and generates invoices
# Kafka consumer: count API calls per consumer
from collections import defaultdict
usage_counters: dict[str, int] = defaultdict(int)
for log_entry in kafka_consumer:
consumer_id = log_entry['consumer_id']
if log_entry['status'] < 500: # only count successful requests
usage_counters[consumer_id] += 1
# Flush to billing DB every minute
flush_to_billing_db(usage_counters, period=current_billing_period())
Quota Enforcement Loop
Request → Gateway → check quota (Redis) → allow/deny
→ log request → Kafka
→ Stream Processor
→ Billing DB (usage counters)
→ Quota Service → update Redis quota
Tools and Integrations
| Tool | Use Case |
|---|---|
| Kong Analytics (Konnect) | Built-in dashboard for Kong users |
| AWS API Gateway metrics | CloudWatch integration, per-stage metrics |
| Envoy access log service | gRPC streaming to Fluentd/Logstash |
| Grafana + Loki | Log aggregation and query for self-hosted gateways |
| Elastic (ELK) Stack | Full-text search and aggregation over JSON logs |
| Datadog APM | Distributed tracing + gateway metrics integration |
Summary
Gateway access logs in JSON format are the foundation for API analytics — record request ID, latency breakdown, consumer ID, and upstream status on every request. Enable body logging sparingly and mask PII before writing to logs. Build analytics dashboards from Prometheus metrics or aggregated log queries to track request volume, error rates, and latency percentiles. Use the gateway as the authoritative source for usage metering in billing systems.