Guides
Error Handling & Retries

Error Handling & Retries

Production integrations fail. Networks drop, services restart, rate limits are hit, and payloads occasionally have bad data. This guide covers every error code you'll encounter, when to retry vs. when to give up, and the retry infrastructure that makes your integration resilient.


Error response structure

All API errors use the same envelope:

{
  "success": false,
  "error": {
    "code": "CDP_ETL.VALIDATION.REQUEST_INVALID",
    "message": "records[2].email: invalid email format",
    "details": {
      "path": "records[2].email",
      "value": "not-an-email"
    }
  }
}

The code field is the machine-readable identifier. message is human-readable. details varies by error type — always include both in your logs.

Every response also includes an x-correlation-id header. Log it. If you need to contact support, this ID lets them trace the request end-to-end.


Error code reference

4xx — Client errors

HTTPCodeMeaningRetryable?
400CDP_ETL.VALIDATION.REQUEST_INVALIDMalformed JSON, missing required field, or field value failed type/format checkNo — fix the data
401CDP_ETL.AUTH.UNAUTHORIZEDToken missing, expired, or revokedNo — fix the token
401CDP_ETL.AUTH.TOKEN_EXPIREDToken has expiredNo — refresh the token
401CDP_ETL.AUTH.TOKEN_INVALIDToken is malformed or unrecognizedNo — check the token
403CDP_ETL.AUTH.FORBIDDENToken valid but lacks the required scopeNo — add the scope
404CDP_ETL.NOT_FOUNDResource (list, import job, audience) doesn't existNo
409CDP_ETL.* (conflict)Concurrent write to the same key racedYes — retry with backoff
413CDP_ETL.VALIDATION.REQUEST_INVALIDRequest body > 10 MBNo — reduce payload
422CDP_ETL.VALIDATION.REQUEST_SCHEMAField name doesn't exist in the object schemaNo — fix field names
429CDP_ETL.* (rate limited)Token quota exceededYes — respect Retry-After

5xx — Server errors

HTTPCodeMeaningRetryable?
500CDP_ETL.INTERNAL.UNHANDLED_EXCEPTIONUnexpected server errorYes
502CDP_ETL.*Upstream service unavailableYes
503CDP_ETL.*Planned maintenance or overloadYes
504CDP_ETL.*Request took > 30 sYes

Do not retry 4xx errors except 429 and 409. The data is bad or the credentials are wrong — retrying won't fix it. Log the error and alert.


Retry strategy

Use exponential backoff with jitter. The jitter prevents thundering-herd retries when many clients fail simultaneously.

import os, random, time, requests
 
API_KEY = os.environ["EXPERITURE_API_KEY"]
BASE_URL = "https://api.experiture.ai/public/v1"
RETRYABLE_STATUS = {409, 429, 500, 502, 503, 504}
 
def with_retry(
    fn,
    max_attempts: int = 5,
    base_delay: float = 1.0,
    max_delay: float = 60.0,
):
    for attempt in range(max_attempts):
        resp = fn()
        if resp.status_code < 400:
            return resp.json()
 
        if resp.status_code not in RETRYABLE_STATUS:
            resp.raise_for_status()  # Non-retryable — propagate immediately
 
        if attempt == max_attempts - 1:
            resp.raise_for_status()  # Exhausted attempts
 
        # Respect Retry-After for rate limits
        retry_after = float(resp.headers.get("Retry-After", 0))
        backoff = min(max_delay, base_delay * (2 ** attempt) + random.uniform(0, 1))
        sleep_for = max(retry_after, backoff)
 
        logger.warning(
            "API error %s (attempt %d/%d), retrying in %.1fs",
            resp.status_code, attempt + 1, max_attempts, sleep_for,
        )
        time.sleep(sleep_for)

Usage:

def do_upsert():
    return requests.post(
        f"{BASE_URL}/records/profiles/upsert",
        headers={
            "Authorization": f"Bearer {API_KEY}",
            "Content-Type": "application/json",
            "Idempotency-Key": idempotency_key,
        },
        json={"record": record, "matchKey": "email"},
    )
 
result = with_retry(do_upsert)

Idempotency: the foundation of safe retries

Without idempotency keys, a retry on an append operation creates a duplicate row. With them, re-sending the same request returns the cached original response — no side effects.

Always pass an Idempotency-Key on writes. Generate a key tied to the logical event, not the HTTP attempt:

# CORRECT — same key on every retry of this event
key = event["event_id"]  # from Stripe, Shopify, Segment, etc.
 
# WRONG — generates a new key on each attempt, defeating idempotency
key = str(uuid.uuid4())  # called inside the retry loop

For events where you don't have a natural upstream ID, derive a stable key from the content:

import hashlib, json, uuid
 
def stable_key(record: dict) -> str:
    canonical = json.dumps(record, sort_keys=True, ensure_ascii=True)
    return str(uuid.UUID(hashlib.md5(canonical.encode()).hexdigest()))

The same Idempotency-Key + body combination returns the cached response for 24 hours.


Rate limit handling

Rate limit responses include a Retry-After header telling you exactly how long to wait:

HTTP/1.1 429 Too Many Requests
Retry-After: 2.5
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1745255432

Always check Retry-After before calculating your own backoff:

def handle_rate_limit(response: requests.Response, attempt: int) -> None:
    retry_after = float(response.headers.get("Retry-After", 0))
    backoff = min(60, (2 ** attempt) + random.random())
    sleep_for = max(retry_after, backoff)
    time.sleep(sleep_for)

Also monitor X-RateLimit-Remaining on every response — don't wait for a 429 to discover you're near the limit:

remaining = int(response.headers.get("X-RateLimit-Remaining", 9999))
if remaining < 10:
    logger.warning("Rate limit nearly exhausted: %d remaining", remaining)
    time.sleep(0.5)  # Voluntary backpressure

Handling CDP_ETL.VALIDATION.REQUEST_SCHEMA in production

422 CDP_ETL.VALIDATION.REQUEST_SCHEMA means you sent a field that doesn't exist in the object schema. This is a configuration error, not a data error. Handle it differently from other failures:

resp = requests.post(
    f"{BASE_URL}/records/profiles/upsert",
    headers={"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json"},
    json={"record": record, "matchKey": "email"},
)
if resp.status_code == 422:
    error = resp.json().get("error", {})
    if error.get("code") == "CDP_ETL.VALIDATION.REQUEST_SCHEMA":
        # Don't retry — alert the team and drop to DLQ
        alerting.fire(
            title="CDP schema mismatch",
            message=f"Field not found: {error.get('message')}",
            severity="warning",
        )
        dlq.publish({"record": record, "error": error.get("message"), "type": "CDP_ETL.VALIDATION.REQUEST_SCHEMA"})
    else:
        resp.raise_for_status()
elif not resp.ok:
    resp.raise_for_status()

The schema mismatch DLQ lets you replay records after you add the missing field to the schema, without losing data.


Dead-letter queue pattern

Not every failure should crash your service. Route non-retryable errors to a DLQ for later investigation and replay:

from enum import Enum
 
class FailureType(Enum):
    TRANSIENT = "transient"      # Retry
    SCHEMA    = "schema"         # Fix schema, then replay
    BAD_DATA  = "bad_data"       # Fix source data
    AUTH      = "auth"           # Fix credentials
 
def classify_error(status_code: int, error_code: str) -> FailureType:
    if status_code in RETRYABLE_STATUS:
        return FailureType.TRANSIENT
    if error_code in ("CDP_ETL.VALIDATION.REQUEST_SCHEMA",):
        return FailureType.SCHEMA
    if error_code in ("CDP_ETL.VALIDATION.REQUEST_INVALID",):
        return FailureType.BAD_DATA
    if status_code in (401, 403):
        return FailureType.AUTH
    return FailureType.BAD_DATA
 
def write_with_dlq(record: dict, idempotency_key: str):
    def do_upsert():
        return requests.post(
            f"{BASE_URL}/records/profiles/upsert",
            headers={
                "Authorization": f"Bearer {API_KEY}",
                "Content-Type": "application/json",
                "Idempotency-Key": idempotency_key,
            },
            json={"record": record, "matchKey": "email"},
        )
 
    resp_data = with_retry(do_upsert)
    if resp_data is None:
        # with_retry raised on non-retryable
        return
 
    # If we land here something unexpected happened — check for error
    resp = do_upsert()
    if not resp.ok:
        error = resp.json().get("error", {})
        failure_type = classify_error(resp.status_code, error.get("code", ""))
        if failure_type == FailureType.TRANSIENT:
            raise requests.HTTPError(response=resp)
        dlq.publish({
            "record": record,
            "idempotency_key": idempotency_key,
            "error_code": error.get("code"),
            "error_message": error.get("message"),
            "failure_type": failure_type.value,
            "failed_at": datetime.utcnow().isoformat() + "Z",
        })

Webhook integration: returning the right HTTP status

When your webhook handler catches CDP errors, what you return to the webhook provider controls whether it retries:

@app.post("/webhooks/stripe")
async def handle_stripe(request: Request, stripe_signature: str = Header(None)):
    # ... verify signature, parse event ...
    resp = requests.post(
        f"{BASE_URL}/records/profiles/upsert",
        headers={"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json"},
        json={"record": record, "matchKey": "email"},
    )
    error = resp.json().get("error", {}) if not resp.ok else {}
    failure_type = classify_error(resp.status_code, error.get("code", "")) if not resp.ok else None
 
    if failure_type == FailureType.TRANSIENT:
        # Tell Stripe to retry
        raise HTTPException(status_code=503, detail="upstream_unavailable")
 
    if failure_type == FailureType.SCHEMA:
        # Bad data — don't retry, but log loudly
        logger.error("Schema mismatch on event %s: %s", event["id"], error.get("message"))
        dlq.publish({"event": event, "error": error.get("message")})
        return {"ok": True}  # Return 200 to avoid infinite retry
 
    if failure_type == FailureType.BAD_DATA:
        # Also return 200 — bad data won't improve on retry
        logger.warning("Validation error on event %s: %s", event["id"], error.get("message"))
        return {"ok": True}
 
    if failure_type == FailureType.AUTH:
        # This is your problem, not Stripe's — alert, return 200
        alerting.fire("CDP auth error", severity="critical")
        return {"ok": True}
 
    return {"ok": True}

Logging and observability

Always log enough context to debug a failure without retrying the original request:

import structlog
 
log = structlog.get_logger()
 
def write_profile(record: dict, event_id: str):
    resp = requests.post(
        f"{BASE_URL}/records/profiles/upsert",
        headers={
            "Authorization": f"Bearer {API_KEY}",
            "Content-Type": "application/json",
            "Idempotency-Key": event_id,
        },
        json={"record": record, "matchKey": "email"},
    )
    if resp.ok:
        result = resp.json()
        log.info("profile.written",
            email=record.get("email"),
            event_id=event_id,
            operation=result.get("data", {}).get("operation"),
        )
        return result
    else:
        error = resp.json().get("error", {})
        log.error("profile.write_failed",
            email=record.get("email"),
            event_id=event_id,
            status_code=resp.status_code,
            error_code=error.get("code"),
            error_message=error.get("message"),
            correlation_id=resp.headers.get("x-correlation-id"),  # Include this in support tickets
        )
        resp.raise_for_status()

Quick reference: what to do with each error

ErrorAction
429 (rate limited)Sleep for Retry-After, then retry with same idempotency key
409 (conflict)Retry with exponential backoff; idempotency ensures correct result
5xxRetry with exponential backoff
422 CDP_ETL.VALIDATION.REQUEST_SCHEMALog + alert + DLQ; fix schema and replay
400 CDP_ETL.VALIDATION.REQUEST_INVALIDLog + DLQ; fix source data and replay
401 CDP_ETL.AUTH.UNAUTHORIZED / CDP_ETL.AUTH.TOKEN_EXPIREDAlert immediately; rotate/fix credentials
403 CDP_ETL.AUTH.FORBIDDENAdd required scope to token; do not retry
413Reduce batch size; re-send
404 CDP_ETL.NOT_FOUNDVerify resource ID; do not retry

See Also