Intermediate

Guardrails for Bedrock

Guardrails for Bedrock is a configurable safety and compliance layer that sits between your application and the model. You configure what to block or redact; Bedrock evaluates every request and response against your rules before anything reaches the user.

What Guardrails Do

A guardrail is a set of policies applied to both inputs (what the user sends) and outputs (what the model generates). When a guardrail triggers, Bedrock returns a configurable blocked message instead of the model's response — the model may never even see the flagged content.

Policy Types

Content Filters

Block harmful content across six categories:

• Hate speech
• Insults
• Sexual content
• Violence
• Misconduct (criminal activity, self-harm)
• Prompt attacks (jailbreaks, system prompt leakage)

Each category has Low / Medium / High threshold settings.

Denied Topics

Define custom off-limits topics in plain English.

Example: "Investment advice or specific stock recommendations" — Bedrock uses an LLM classifier to detect and block these topics. No regex or keyword lists needed.

PII Redaction

Detect and redact/block personally identifiable information.

• Names, email, phone, address
• SSN, passport, driver licence numbers
• Credit card numbers, bank account numbers
• IP addresses, AWS access keys

Can be applied to input, output, or both. Choose redact (replace with [PII]) or block.

Word Filters

Block exact words or phrases.

Includes a built-in profanity list you can enable in one click, plus custom word lists for brand, competitor, or domain-specific terms.

Grounding Check

The grounding check detects hallucinations by verifying that the model's output is factually supported by the retrieved context (the source documents or knowledge base chunks). It compares the output against the context and returns a grounding score. If the score falls below your threshold, Bedrock blocks the response.

This is particularly valuable for RAG applications where accuracy is critical — legal document Q&A, medical information, financial data. The relevance check (a separate policy) also verifies that the model's response is relevant to the user's query.

Automated Reasoning

Automated reasoning checks (2025) apply formal logic verification to factual claims. You define logical rules and constraints; Bedrock verifies that model outputs don't violate them before returning. Example use cases: insurance policy compliance checking, mathematical claim verification, regulatory rule adherence. This goes beyond keyword filtering or LLM-based classification — it uses symbolic reasoning.

Applying Guardrails

You can apply a guardrail to a Converse or InvokeModel call by passing the guardrail ID and version:

response = bedrock_runtime.converse(
    modelId="anthropic.claude-sonnet-4-5",
    messages=[{"role": "user", "content": [{"text": user_input}]}],
    guardrailConfig={
        "guardrailIdentifier": "GUARDRAIL_ID",
        "guardrailVersion": "DRAFT",  # or a numeric version
        "trace": "enabled",  # returns assessment details
    },
)

# Check if guardrail triggered
if response.get("stopReason") == "guardrail_intervened":
    print("Guardrail blocked this response")

Checklist: Do You Understand This?

What are the six content filter categories in Bedrock Guardrails?
How do denied topics differ from word filters?
What does the grounding check do and when should you enable it?
How do you apply a guardrail to a Converse API call?