POST /api/v2/moderation/check
Runs the content-moderation engine against the supplied text without blocking. Returns structured findings (category, severity, suggestion) so an agent can validate input before submitting it to a campaign-create or other blocking surface. Mirrors the engine that powers brief screening on POST /campaigns.
Request
Parameters
| Field | Type | Required | Notes |
|---|---|---|---|
text | string | Yes | Text to evaluate (1–10,000 chars). Exceeding 10,000 chars returns a 400 VALIDATION_ERROR |
direction | enum | No | input runs brief/jailbreak/policy patterns; output runs LLM-output (refusal/identity/PII) patterns. Default input |
surface | string | No | Surface label for metric attribution (1–100 chars). Default moderation.check |
Response
| Field | Type | Notes |
|---|---|---|
passed | boolean | True when no moderation patterns matched |
wouldBlock | boolean | True when at least one finding would trigger a 422 on a blocking surface |
findings[] | array | All matching findings; empty when passed is true. Both blocking and non-blocking findings are reported |
findings[].category | string | Moderation category (e.g. jailbreak_attempt, pii_leak) |
findings[].severity | enum | low, medium, high, critical. high and above blocks at default thresholds |
findings[].suggestion | string | Remediation guidance for the agent to self-correct |
Errors
400 VALIDATION_ERROR— emptytext, ordirection/surfaceout of range.401 UNAUTHORIZED— missing or invalid API key.
Related
Activity tasks
All activity operations
List activity
Configuration audit feed
Errors
Shared error contract