Error Handling & Retries

Production integrations fail. Networks drop, services restart, rate limits are hit, and payloads occasionally have bad data. This guide covers every error code you'll encounter, when to retry vs. when to give up, and the retry infrastructure that makes your integration resilient.

Error response structure

All API errors use the same envelope:

{
  "success": false,
  "error": {
    "code": "CDP_ETL.VALIDATION.REQUEST_INVALID",
    "message": "records[2].email: invalid email format",
    "details": {
      "path": "records[2].email",
      "value": "not-an-email"
    }
  }
}

The code field is the machine-readable identifier. message is human-readable. details varies by error type — always include both in your logs.

Every response also includes an x-correlation-id header. Log it. If you need to contact support, this ID lets them trace the request end-to-end.

Error code reference

4xx — Client errors

HTTP	Code	Meaning	Retryable?
400	`CDP_ETL.VALIDATION.REQUEST_INVALID`	Malformed JSON, missing required field, or field value failed type/format check	No — fix the data
401	`CDP_ETL.AUTH.UNAUTHORIZED`	Token missing, expired, or revoked	No — fix the token
401	`CDP_ETL.AUTH.TOKEN_EXPIRED`	Token has expired	No — refresh the token
401	`CDP_ETL.AUTH.TOKEN_INVALID`	Token is malformed or unrecognized	No — check the token
403	`CDP_ETL.AUTH.FORBIDDEN`	Token valid but lacks the required scope	No — add the scope
404	`CDP_ETL.NOT_FOUND`	Resource (list, import job, audience) doesn't exist	No
409	`CDP_ETL.*` (conflict)	Concurrent write to the same key raced	Yes — retry with backoff
413	`CDP_ETL.VALIDATION.REQUEST_INVALID`	Request body > 10 MB	No — reduce payload
422	`CDP_ETL.VALIDATION.REQUEST_SCHEMA`	Field name doesn't exist in the object schema	No — fix field names
429	`CDP_ETL.*` (rate limited)	Token quota exceeded	Yes — respect `Retry-After`

5xx — Server errors

HTTP	Code	Meaning	Retryable?
500	`CDP_ETL.INTERNAL.UNHANDLED_EXCEPTION`	Unexpected server error	Yes
502	`CDP_ETL.*`	Upstream service unavailable	Yes
503	`CDP_ETL.*`	Planned maintenance or overload	Yes
504	`CDP_ETL.*`	Request took > 30 s	Yes

Do not retry 4xx errors except 429 and 409. The data is bad or the credentials are wrong — retrying won't fix it. Log the error and alert.

Retry strategy

Use exponential backoff with jitter. The jitter prevents thundering-herd retries when many clients fail simultaneously.

import os, random, time, requests
 
API_KEY = os.environ["EXPERITURE_API_KEY"]
BASE_URL = "https://api.experiture.ai/public/v1"
RETRYABLE_STATUS = {409, 429, 500, 502, 503, 504}
 
def with_retry(
    fn,
    max_attempts: int = 5,
    base_delay: float = 1.0,
    max_delay: float = 60.0,
):
    for attempt in range(max_attempts):
        resp = fn()
        if resp.status_code < 400:
            return resp.json()
 
        if resp.status_code not in RETRYABLE_STATUS:
            resp.raise_for_status()  # Non-retryable — propagate immediately
 
        if attempt == max_attempts - 1:
            resp.raise_for_status()  # Exhausted attempts
 
        # Respect Retry-After for rate limits
        retry_after = float(resp.headers.get("Retry-After", 0))
        backoff = min(max_delay, base_delay * (2 ** attempt) + random.uniform(0, 1))
        sleep_for = max(retry_after, backoff)
 
        logger.warning(
            "API error %s (attempt %d/%d), retrying in %.1fs",
            resp.status_code, attempt + 1, max_attempts, sleep_for,
        )
        time.sleep(sleep_for)

Usage:

def do_upsert():
    return requests.post(
        f"{BASE_URL}/records/profiles/upsert",
        headers={
            "Authorization": f"Bearer {API_KEY}",
            "Content-Type": "application/json",
            "Idempotency-Key": idempotency_key,
        },
        json={"record": record, "matchKey": "email"},
    )
 
result = with_retry(do_upsert)

Idempotency: the foundation of safe retries

Without idempotency keys, a retry on an append operation creates a duplicate row. With them, re-sending the same request returns the cached original response — no side effects.

Always pass an Idempotency-Key on writes. Generate a key tied to the logical event, not the HTTP attempt:

# CORRECT — same key on every retry of this event
key = event["event_id"]  # from Stripe, Shopify, Segment, etc.
 
# WRONG — generates a new key on each attempt, defeating idempotency
key = str(uuid.uuid4())  # called inside the retry loop

For events where you don't have a natural upstream ID, derive a stable key from the content:

import hashlib, json, uuid
 
def stable_key(record: dict) -> str:
    canonical = json.dumps(record, sort_keys=True, ensure_ascii=True)
    return str(uuid.UUID(hashlib.md5(canonical.encode()).hexdigest()))

The same Idempotency-Key + body combination returns the cached response for 24 hours.

Rate limit handling

Rate limit responses include a Retry-After header telling you exactly how long to wait:

HTTP/1.1 429 Too Many Requests
Retry-After: 2.5
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1745255432

Always check Retry-After before calculating your own backoff:

def handle_rate_limit(response: requests.Response, attempt: int) -> None:
    retry_after = float(response.headers.get("Retry-After", 0))
    backoff = min(60, (2 ** attempt) + random.random())
    sleep_for = max(retry_after, backoff)
    time.sleep(sleep_for)

Also monitor X-RateLimit-Remaining on every response — don't wait for a 429 to discover you're near the limit:

remaining = int(response.headers.get("X-RateLimit-Remaining", 9999))
if remaining < 10:
    logger.warning("Rate limit nearly exhausted: %d remaining", remaining)
    time.sleep(0.5)  # Voluntary backpressure

Handling `CDP_ETL.VALIDATION.REQUEST_SCHEMA` in production

422 CDP_ETL.VALIDATION.REQUEST_SCHEMA means you sent a field that doesn't exist in the object schema. This is a configuration error, not a data error. Handle it differently from other failures:

resp = requests.post(
    f"{BASE_URL}/records/profiles/upsert",
    headers={"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json"},
    json={"record": record, "matchKey": "email"},
)
if resp.status_code == 422:
    error = resp.json().get("error", {})
    if error.get("code") == "CDP_ETL.VALIDATION.REQUEST_SCHEMA":
        # Don't retry — alert the team and drop to DLQ
        alerting.fire(
            title="CDP schema mismatch",
            message=f"Field not found: {error.get('message')}",
            severity="warning",
        )
        dlq.publish({"record": record, "error": error.get("message"), "type": "CDP_ETL.VALIDATION.REQUEST_SCHEMA"})
    else:
        resp.raise_for_status()
elif not resp.ok:
    resp.raise_for_status()

The schema mismatch DLQ lets you replay records after you add the missing field to the schema, without losing data.

Dead-letter queue pattern

Not every failure should crash your service. Route non-retryable errors to a DLQ for later investigation and replay:

from enum import Enum
 
class FailureType(Enum):
    TRANSIENT = "transient"      # Retry
    SCHEMA    = "schema"         # Fix schema, then replay
    BAD_DATA  = "bad_data"       # Fix source data
    AUTH      = "auth"           # Fix credentials
 
def classify_error(status_code: int, error_code: str) -> FailureType:
    if status_code in RETRYABLE_STATUS:
        return FailureType.TRANSIENT
    if error_code in ("CDP_ETL.VALIDATION.REQUEST_SCHEMA",):
        return FailureType.SCHEMA
    if error_code in ("CDP_ETL.VALIDATION.REQUEST_INVALID",):
        return FailureType.BAD_DATA
    if status_code in (401, 403):
        return FailureType.AUTH
    return FailureType.BAD_DATA
 
def write_with_dlq(record: dict, idempotency_key: str):
    def do_upsert():
        return requests.post(
            f"{BASE_URL}/records/profiles/upsert",
            headers={
                "Authorization": f"Bearer {API_KEY}",
                "Content-Type": "application/json",
                "Idempotency-Key": idempotency_key,
            },
            json={"record": record, "matchKey": "email"},
        )
 
    resp_data = with_retry(do_upsert)
    if resp_data is None:
        # with_retry raised on non-retryable
        return
 
    # If we land here something unexpected happened — check for error
    resp = do_upsert()
    if not resp.ok:
        error = resp.json().get("error", {})
        failure_type = classify_error(resp.status_code, error.get("code", ""))
        if failure_type == FailureType.TRANSIENT:
            raise requests.HTTPError(response=resp)
        dlq.publish({
            "record": record,
            "idempotency_key": idempotency_key,
            "error_code": error.get("code"),
            "error_message": error.get("message"),
            "failure_type": failure_type.value,
            "failed_at": datetime.utcnow().isoformat() + "Z",
        })

Webhook integration: returning the right HTTP status

When your webhook handler catches CDP errors, what you return to the webhook provider controls whether it retries:

@app.post("/webhooks/stripe")
async def handle_stripe(request: Request, stripe_signature: str = Header(None)):
    # ... verify signature, parse event ...
    resp = requests.post(
        f"{BASE_URL}/records/profiles/upsert",
        headers={"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json"},
        json={"record": record, "matchKey": "email"},
    )
    error = resp.json().get("error", {}) if not resp.ok else {}
    failure_type = classify_error(resp.status_code, error.get("code", "")) if not resp.ok else None
 
    if failure_type == FailureType.TRANSIENT:
        # Tell Stripe to retry
        raise HTTPException(status_code=503, detail="upstream_unavailable")
 
    if failure_type == FailureType.SCHEMA:
        # Bad data — don't retry, but log loudly
        logger.error("Schema mismatch on event %s: %s", event["id"], error.get("message"))
        dlq.publish({"event": event, "error": error.get("message")})
        return {"ok": True}  # Return 200 to avoid infinite retry
 
    if failure_type == FailureType.BAD_DATA:
        # Also return 200 — bad data won't improve on retry
        logger.warning("Validation error on event %s: %s", event["id"], error.get("message"))
        return {"ok": True}
 
    if failure_type == FailureType.AUTH:
        # This is your problem, not Stripe's — alert, return 200
        alerting.fire("CDP auth error", severity="critical")
        return {"ok": True}
 
    return {"ok": True}

Logging and observability

Always log enough context to debug a failure without retrying the original request:

import structlog
 
log = structlog.get_logger()
 
def write_profile(record: dict, event_id: str):
    resp = requests.post(
        f"{BASE_URL}/records/profiles/upsert",
        headers={
            "Authorization": f"Bearer {API_KEY}",
            "Content-Type": "application/json",
            "Idempotency-Key": event_id,
        },
        json={"record": record, "matchKey": "email"},
    )
    if resp.ok:
        result = resp.json()
        log.info("profile.written",
            email=record.get("email"),
            event_id=event_id,
            operation=result.get("data", {}).get("operation"),
        )
        return result
    else:
        error = resp.json().get("error", {})
        log.error("profile.write_failed",
            email=record.get("email"),
            event_id=event_id,
            status_code=resp.status_code,
            error_code=error.get("code"),
            error_message=error.get("message"),
            correlation_id=resp.headers.get("x-correlation-id"),  # Include this in support tickets
        )
        resp.raise_for_status()

Quick reference: what to do with each error

Error	Action
`429` (rate limited)	Sleep for `Retry-After`, then retry with same idempotency key
`409` (conflict)	Retry with exponential backoff; idempotency ensures correct result
`5xx`	Retry with exponential backoff
`422 CDP_ETL.VALIDATION.REQUEST_SCHEMA`	Log + alert + DLQ; fix schema and replay
`400 CDP_ETL.VALIDATION.REQUEST_INVALID`	Log + DLQ; fix source data and replay
`401 CDP_ETL.AUTH.UNAUTHORIZED` / `CDP_ETL.AUTH.TOKEN_EXPIRED`	Alert immediately; rotate/fix credentials
`403 CDP_ETL.AUTH.FORBIDDEN`	Add required scope to token; do not retry
`413`	Reduce batch size; re-send
`404 CDP_ETL.NOT_FOUND`	Verify resource ID; do not retry