Error taxonomy¶
Every error response carries error.code, error.type, error.message, and (where applicable) error.param. Codes are stable; messages may evolve.
Top-level error types¶
| Type | HTTP | Meaning | Retry? |
|---|---|---|---|
auth_error |
401, 403 | Invalid, revoked, or scope-insufficient API key | No — fix the key |
validation_error |
400 | Malformed request | No — fix the request |
rate_limit_error |
429 | Rate limit exceeded | Yes, after Retry-After |
inference_error |
502 | Upstream inference provider failed | Yes, with backoff |
region_unavailable |
503 | Pinned region is degraded | Conditionally — see below |
policy_error |
403 | Request blocked by Article III screening | No — review use case |
methodology_error |
500 | Receipt generation failed (the inference may have succeeded) | Yes |
internal_error |
500 | Unspecified gateway failure | Yes, with backoff |
Specific error codes¶
auth_error¶
Codes:
invalid_api_key— key not found or malformedrevoked_api_key— key was revoked; check audit logexpired_api_key— key past expiry (Audit Plus / Enterprise feature; the default Audit tier does not auto-expire)insufficient_scope— key is valid but lacks the scope for this endpoint
validation_error¶
The error.param field tells you which parameter is invalid:
{"error": {"type": "validation_error", "code": "invalid_model", "param": "model", "message": "..."}}
Codes:
invalid_model— model is not supported on this region/accountinvalid_region— region code does not existinvalid_tier— requested tier is not supported on this routeprompt_too_long— prompt exceeds the model's context windowinvalid_temperature/invalid_top_p/ etc. — standard sampler validations
rate_limit_error¶
Codes:
requests_per_second_exceeded— consultRetry-Afterdaily_budget_exceeded— until midnight UTC of next dayconcurrent_streams_exceeded— close idle streams or upgrade
policy_error¶
Codes:
dual_use_screen_block— request blocked by Article III dual-use screening; if you believe this is in error, contact[email protected]acceptable_use_violation— request matches our published Acceptable Use Policy violation patternscustomer_suspended— your account is suspended pending review
region_unavailable¶
Codes:
region_degraded— the pinned region is in incident mode; consult status.vettedinference.comregion_saturated— the pinned region is at capacity; retry with backoff or accept default routingregion_decommissioned— the requested region is no longer offered (rare; advance notice given)
methodology_error¶
If methodology fails but inference succeeded, the response carries the completion plus a degraded receipt with tier: "degraded" and an explanation. This is the rare-but-real case where you got an answer but we could not produce a confident estimate. Treat the response as a normal completion; flag the degraded receipt to your audit pipeline for re-calculation.
Idempotency¶
For chat completions and embeddings, set the Idempotency-Key header to a stable client-generated UUID. Identical requests within 24 hours return the cached response (and the same receipt). This is the recommended pattern for financial-controlled retry logic.
curl https://api.vettedinference.com/v1/chat/completions \
-H "Authorization: Bearer $VETTED_API_KEY" \
-H "Idempotency-Key: 5e8c3a2f-4b9d-4e7a-b3f2-a1d5e9c7b8e2" \
-H "Content-Type: application/json" \
-d '...'
Recommended retry strategy¶
import time
def call_with_retry(fn, max_attempts=4):
for attempt in range(max_attempts):
try:
return fn()
except RateLimitError as e:
wait = e.retry_after or (2 ** attempt)
time.sleep(wait)
except (InferenceError, InternalError):
if attempt == max_attempts - 1:
raise
time.sleep(2 ** attempt + random.random())
# auth_error, validation_error, policy_error: do not retry