StatusCodeFYI

Symptome

- Outbound calls fail immediately with SIP 503 Service Unavailable during peak hours while lower-volume periods succeed normally
- Some calls go through while others receive 503 — indicates the trunk is at or near its concurrent channel limit, not completely down
- SIP provider dashboard or CDR reports show the trunk at 100% capacity when failures occur
- sngrep or SIP trace shows 503 arriving from the SIP provider's proxy IP within milliseconds of the INVITE — rejected before processing
- Retry-After header may be present in the 503 response, indicating when the server expects to be available again

Grundursachen

SIP trunk concurrent channel limit exceeded — the provider's contract allows N simultaneous calls and all channels are occupied when the new INVITE arrives
SIP provider is experiencing an outage or scheduled maintenance and is returning 503 to all new requests during the window
DNS SRV record for the SIP trunk resolves to a server that is down — no failover to secondary SRV targets because the client is not correctly implementing SRV priority/weight failover
SIP proxy is overloaded and applying load shedding — returning 503 with Retry-After to protect downstream servers from being overwhelmed
Network congestion or packet loss causing SIP UDP packets to be dropped before reaching the provider, leaving stale call sessions that count against the channel limit

Diagnose

**Step 1: Confirm it's a capacity issue, not an outage**
```bash
# Check current active calls on your PBX (Asterisk example)
asterisk -rx 'core show channels count'
# Compare with your trunk's contracted concurrent call limit
# If at or near limit → capacity issue
# If well below limit → provider outage or configuration issue
```

**Step 2: Read the 503 response headers**
```bash
sngrep -d eth0 port 5060
# Look for Retry-After header in the 503 response
# Retry-After: 60 → server will accept again in 60 seconds
# No Retry-After → could be a permanent config or outage issue
```

**Step 3: Verify DNS SRV resolution for the SIP trunk**
```bash
# SIP providers typically publish SRV records
dig SRV _sip._udp.provider.example.com
dig SRV _sip._tcp.provider.example.com
# Confirm all SRV targets resolve and are reachable
for host in sip1.provider.com sip2.provider.com; do
echo -n "$host: "; nc -uz $host 5060 && echo OK || echo FAIL
done
```

**Step 4: Check provider status page**

Most SIP providers have a status page or support channel. If the 503 started at a fixed time and affects all calls, it is likely a provider outage rather than a capacity issue.

**Step 5: Review your call routing config for zombie sessions**
```bash
# In Asterisk, look for calls stuck in non-up states
asterisk -rx 'core show channels verbose'
# Kill stuck channels
asterisk -rx 'channel request hangup SIP/provider-00000001'
```

Lösung

**Fix 1: Increase trunk capacity with your SIP provider**

Contact your SIP provider to increase the concurrent channel limit. Most providers allow on-demand scaling. As a short-term workaround, shorten call timeouts to free channels faster.

**Fix 2: Configure SIP trunk failover to a secondary provider**
```ini
# Asterisk pjsip.conf — define two trunks with failover
[primary_trunk]
type=endpoint
outbound_auth=primary_auth
aors=primary_aor

[secondary_trunk]
type=endpoint
outbound_auth=secondary_auth
aors=secondary_aor

# extensions.conf — try primary, fall back to secondary on 503
exten => _X.,1,Dial(PJSIP/${EXTEN}@primary_trunk)
same => n,GoToIf($["${HANGUPCAUSE}" = "38"]?failover)
same => n(failover),Dial(PJSIP/${EXTEN}@secondary_trunk)
```

**Fix 3: Implement call queuing instead of hard rejection**
```ini
# Asterisk queues.conf — queue callers when all agents/trunks are busy
[outbound-queue]
strategy=ringall
maxlen=20
retry=5
timeout=30
```

**Fix 4: Honor Retry-After and implement exponential backoff**
```python
# In a SIP application layer
def handle_503(response):
retry_after = int(response.headers.get('Retry-After', 5))
time.sleep(retry_after)
return retry_call()
```

Prävention

- **Monitor concurrent call counts** in real time and alert when usage exceeds 80% of the trunk capacity limit
- **Provision redundant SIP trunks** from separate providers so that 503 from one provider triggers automatic failover to the second
- **Implement call admission control (CAC)** in the PBX to prevent exceeding the trunk limit before sending INVITEs that will be rejected
- **Use TCP or TLS transport** instead of UDP for the SIP trunk — TCP ensures retransmission and avoids packet-loss-caused ghost sessions
- **Test failover regularly** by simulating a 503 response in a staging environment to verify your dial plan handles it correctly

503 Service Unavailable — SIP Trunk Overloaded

Symptome

Grundursachen

Diagnose

Lösung

Prävention

Verwandte Statuscodes

Verwandte Begriffe