Load Testing Goals
Load testing answers questions that no unit or integration test can:
- What is the maximum throughput before error rates climb?
- Does the API return 503s or just slow down under overload?
- Are there memory leaks that surface only after 10 minutes of sustained load?
- Does the 99th percentile latency stay within SLO under normal traffic?
Load testing should be a regular part of your CI pipeline, not a one-off exercise before a major launch. Regressions in performance are just as real as regressions in correctness.
k6
k6 is a Go-based load testing tool with a JavaScript scripting API. It is fast, has low overhead per virtual user, and integrates well with CI pipelines.
Basic Script
// load-test.js
import http from 'k6/http';
import { check, sleep } from 'k6';
export const options = {
vus: 50, // 50 virtual users
duration: '2m', // run for 2 minutes
thresholds: {
http_req_failed: ['rate<0.01'], // <1% error rate
http_req_duration: ['p(95)<500'], // 95th percentile < 500ms
'http_req_duration{status:200}': ['p(99)<1000'],
},
};
export default function () {
const res = http.get('https://api.example.com/users');
check(res, {
'status is 200': (r) => r.status === 200,
'response time < 200ms': (r) => r.timings.duration < 200,
'has results array': (r) => r.json('results') !== undefined,
});
sleep(1);
}
k6 run load-test.js
Ramp-Up Patterns
Real traffic doesn't start at full load. Use stages to ramp up gradually:
export const options = {
stages: [
{ duration: '30s', target: 10 }, // ramp up to 10 VUs
{ duration: '1m', target: 50 }, // ramp up to 50 VUs
{ duration: '2m', target: 50 }, // hold at 50 VUs
{ duration: '30s', target: 0 }, // ramp down
],
};
Checking Status Codes
import { check } from 'k6';
import { Rate } from 'k6/metrics';
const errorRate = new Rate('errors');
export default function () {
const res = http.post('https://api.example.com/orders', JSON.stringify({
item_id: 1, qty: 2
}), { headers: { 'Content-Type': 'application/json' } });
const ok = check(res, {
'status is 201': (r) => r.status === 201,
'not a 429': (r) => r.status !== 429,
'not a 500': (r) => r.status !== 500,
});
errorRate.add(!ok);
}
Running k6 in CI
# GitHub Actions
- name: Load test
uses: grafana/[email protected]
with:
filename: tests/load-test.js
env:
BASE_URL: https://staging.example.com
Locust
Locust is a Python load testing tool. Its test scripts are plain Python classes, making it easy to integrate with your existing Python codebase and test fixtures.
# locustfile.py
from locust import HttpUser, task, between
class APIUser(HttpUser):
wait_time = between(1, 3) # think time between requests
def on_start(self):
"""Called once per user when they start."""
response = self.client.post('/auth/token', json={
'username': '[email protected]',
'password': 'testpassword'
})
self.token = response.json()['access_token']
@task(3) # weight: called 3x more often than other tasks
def list_users(self):
with self.client.get(
'/api/users',
headers={'Authorization': f'Bearer {self.token}'},
catch_response=True
) as response:
if response.status_code != 200:
response.failure(f'Expected 200, got {response.status_code}')
@task(1)
def create_order(self):
with self.client.post(
'/api/orders',
json={'item_id': 1, 'qty': 1},
headers={'Authorization': f'Bearer {self.token}'},
catch_response=True
) as response:
if response.status_code not in (200, 201):
response.failure(f'Unexpected status: {response.status_code}')
# Headless mode (CI)
locust -f locustfile.py --headless -u 100 -r 10 --run-time 2m \
--host https://staging.example.com \
--only-summary
# With web UI
locust -f locustfile.py
# open http://localhost:8089
Distributed Locust
For very high load, run Locust in distributed mode:
# Master node
locust -f locustfile.py --master --expect-workers 4
# Worker nodes (each on a separate machine)
locust -f locustfile.py --worker --master-host=<master-ip>
Artillery
Artillery uses YAML configuration for test scenarios, making it easy to review in pull requests and share with non-developers.
# load-test.yml
config:
target: https://api.example.com
phases:
- duration: 60
arrivalRate: 10
name: Warm up
- duration: 120
arrivalRate: 50
name: Sustained load
plugins:
expect: {}
scenarios:
- name: API smoke test
flow:
- post:
url: /auth/token
json:
username: [email protected]
password: testpassword
capture:
json: $.access_token
as: token
expect:
- statusCode: 200
- get:
url: /api/users
headers:
Authorization: Bearer {{ token }}
expect:
- statusCode: 200
- hasProperty: results
npx artillery run load-test.yml --output report.json
npx artillery report report.json
Interpreting Results
Key Metrics
| Metric | What it tells you |
|---|---|
| Requests/sec (RPS) | Throughput ceiling |
| Error rate by status | Where failures occur (4xx vs 5xx) |
| p50 latency | Typical user experience |
| p95/p99 latency | Worst-case experience for 5%/1% of users |
| Max concurrency | At what point does the system degrade? |
Reading Error Patterns
- Climbing 429s: You are hitting rate limits — adjust load or request throttling config
- Sudden 502/503 spike: A downstream service is falling over
- Gradual 500 increase: Memory leak or connection pool exhaustion — check heap/pool metrics
- Latency spikes without errors: GC pauses, lock contention, or disk I/O saturation
Bottleneck Identification
Correlate load test results with server metrics:
# During a load test, watch these in parallel:
# CPU: top, htop
# Connections: ss -s
# DB: pg_stat_activity (max active connections)
# Memory: free -h
# Logs: tail -f /var/log/app/error.log | grep -E '50[0-9]'
A load test that produces no 5xx errors but shows 90% CPU saturation tells you to scale horizontally. One that shows low CPU but 503s tells you connection limits or queues are full.