Load Testing APIs: k6, Locust, and Artillery

Load Testing Goals

Load testing answers questions that no unit or integration test can:

What is the maximum throughput before error rates climb?
Does the API return 503s or just slow down under overload?
Are there memory leaks that surface only after 10 minutes of sustained load?
Does the 99th percentile latency stay within SLO under normal traffic?

Load testing should be a regular part of your CI pipeline, not a one-off exercise before a major launch. Regressions in performance are just as real as regressions in correctness.

k6

k6 is a Go-based load testing tool with a JavaScript scripting API. It is fast, has low overhead per virtual user, and integrates well with CI pipelines.

Basic Script

// load-test.js
import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
  vus: 50,          // 50 virtual users
  duration: '2m',   // run for 2 minutes
  thresholds: {
    http_req_failed: ['rate<0.01'],          // <1% error rate
    http_req_duration: ['p(95)<500'],         // 95th percentile < 500ms
    'http_req_duration{status:200}': ['p(99)<1000'],
  },
};

export default function () {
  const res = http.get('https://api.example.com/users');
  check(res, {
    'status is 200': (r) => r.status === 200,
    'response time < 200ms': (r) => r.timings.duration < 200,
    'has results array': (r) => r.json('results') !== undefined,
  });
  sleep(1);
}

k6 run load-test.js

Ramp-Up Patterns

Real traffic doesn't start at full load. Use stages to ramp up gradually:

export const options = {
  stages: [
    { duration: '30s', target: 10 },   // ramp up to 10 VUs
    { duration: '1m',  target: 50 },   // ramp up to 50 VUs
    { duration: '2m',  target: 50 },   // hold at 50 VUs
    { duration: '30s', target: 0  },   // ramp down
  ],
};

Checking Status Codes

import { check } from 'k6';
import { Rate } from 'k6/metrics';

const errorRate = new Rate('errors');

export default function () {
  const res = http.post('https://api.example.com/orders', JSON.stringify({
    item_id: 1, qty: 2
  }), { headers: { 'Content-Type': 'application/json' } });

  const ok = check(res, {
    'status is 201': (r) => r.status === 201,
    'not a 429': (r) => r.status !== 429,
    'not a 500': (r) => r.status !== 500,
  });
  errorRate.add(!ok);
}

Running k6 in CI

# GitHub Actions
- name: Load test
  uses: grafana/[email protected]
  with:
    filename: tests/load-test.js
  env:
    BASE_URL: https://staging.example.com

Locust

Locust is a Python load testing tool. Its test scripts are plain Python classes, making it easy to integrate with your existing Python codebase and test fixtures.

# locustfile.py
from locust import HttpUser, task, between

class APIUser(HttpUser):
    wait_time = between(1, 3)  # think time between requests

    def on_start(self):
        """Called once per user when they start."""
        response = self.client.post('/auth/token', json={
            'username': '[email protected]',
            'password': 'testpassword'
        })
        self.token = response.json()['access_token']

    @task(3)  # weight: called 3x more often than other tasks
    def list_users(self):
        with self.client.get(
            '/api/users',
            headers={'Authorization': f'Bearer {self.token}'},
            catch_response=True
        ) as response:
            if response.status_code != 200:
                response.failure(f'Expected 200, got {response.status_code}')

    @task(1)
    def create_order(self):
        with self.client.post(
            '/api/orders',
            json={'item_id': 1, 'qty': 1},
            headers={'Authorization': f'Bearer {self.token}'},
            catch_response=True
        ) as response:
            if response.status_code not in (200, 201):
                response.failure(f'Unexpected status: {response.status_code}')

# Headless mode (CI)
locust -f locustfile.py --headless -u 100 -r 10 --run-time 2m \
  --host https://staging.example.com \
  --only-summary

# With web UI
locust -f locustfile.py
# open http://localhost:8089

Distributed Locust

For very high load, run Locust in distributed mode:

# Master node
locust -f locustfile.py --master --expect-workers 4

# Worker nodes (each on a separate machine)
locust -f locustfile.py --worker --master-host=<master-ip>

Artillery

Artillery uses YAML configuration for test scenarios, making it easy to review in pull requests and share with non-developers.

# load-test.yml
config:
  target: https://api.example.com
  phases:
    - duration: 60
      arrivalRate: 10
      name: Warm up
    - duration: 120
      arrivalRate: 50
      name: Sustained load
  plugins:
    expect: {}

scenarios:
  - name: API smoke test
    flow:
      - post:
          url: /auth/token
          json:
            username: [email protected]
            password: testpassword
          capture:
            json: $.access_token
            as: token
          expect:
            - statusCode: 200
      - get:
          url: /api/users
          headers:
            Authorization: Bearer {{ token }}
          expect:
            - statusCode: 200
            - hasProperty: results

npx artillery run load-test.yml --output report.json
npx artillery report report.json

Interpreting Results

Key Metrics

Metric	What it tells you
Requests/sec (RPS)	Throughput ceiling
Error rate by status	Where failures occur (4xx vs 5xx)
p50 latency	Typical user experience
p95/p99 latency	Worst-case experience for 5%/1% of users
Max concurrency	At what point does the system degrade?

Reading Error Patterns

Climbing 429s: You are hitting rate limits — adjust load or request throttling config
Sudden 502/503 spike: A downstream service is falling over
Gradual 500 increase: Memory leak or connection pool exhaustion — check heap/pool metrics
Latency spikes without errors: GC pauses, lock contention, or disk I/O saturation

Bottleneck Identification

Correlate load test results with server metrics:

# During a load test, watch these in parallel:
# CPU: top, htop
# Connections: ss -s
# DB: pg_stat_activity (max active connections)
# Memory: free -h
# Logs: tail -f /var/log/app/error.log | grep -E '50[0-9]'

A load test that produces no 5xx errors but shows 90% CPU saturation tells you to scale horizontally. One that shows low CPU but 503s tells you connection limits or queues are full.