🧠 All Things AI
Intermediate

Bedrock APIs

Bedrock exposes two main inference APIs — Converse and InvokeModel — plus a separate control-plane API for managing resources. Understanding which to use and which boto3 client handles each is the first practical thing to get right.

Two boto3 Clients

The most common confusion: there are two different boto3 clients for Bedrock, and they do different things.

bedrock

Control plane. Manages Bedrock resources.

  • • List available foundation models
  • • Create/manage provisioned throughput
  • • Create fine-tuning jobs
  • • Create/update Guardrails
client = boto3.client("bedrock",
  region_name="us-east-1")

bedrock-runtime

Data plane. Runs inference.

  • • Converse / ConverseStream
  • • InvokeModel / InvokeModelWithResponseStream
  • • Invoke agents and knowledge bases
client = boto3.client("bedrock-runtime",
  region_name="us-east-1")

Converse API

The Converse API (launched 2024) is the recommended API for most use cases. It provides a unified chat interface that works identically across all models that support it. You structure messages as a list of role + content objects — the same shape regardless of whether you're calling Claude, Nova, Llama, or Mistral.

response = client.converse(
    modelId="anthropic.claude-sonnet-4-5",
    messages=[
        {"role": "user", "content": [{"text": "Explain RAG in one paragraph"}]}
    ],
    system=[{"text": "You are a concise technical writer."}],
    inferenceConfig={"maxTokens": 512, "temperature": 0.3},
)
print(response["output"]["message"]["content"][0]["text"])

For streaming responses, use ConverseStream. The event stream returns chunks withcontentBlockDelta events as tokens arrive.

Converse also supports tool use (function calling) with the same unified schema across providers. Define tools in the toolConfig parameter.

InvokeModel API

InvokeModel sends a raw JSON body to the model and gets a raw JSON response. The request and response schema is model-specific — Claude uses a different format than Nova or Llama. You must handle serialisation and response parsing yourself.

import json

body = json.dumps({
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 512,
    "messages": [{"role": "user", "content": "Explain RAG"}],
})
response = client.invoke_model(
    modelId="anthropic.claude-sonnet-4-5",
    body=body,
    contentType="application/json",
)
result = json.loads(response["body"].read())
print(result["content"][0]["text"])

Use Converse when:

  • • You want model-agnostic code
  • • You need tool use (function calling)
  • • You're building a chat or multi-turn app
  • • You want the simplest API surface

Use InvokeModel when:

  • • The model isn't yet supported by Converse
  • • You need model-specific parameters not exposed in Converse
  • • You're calling image generation models (Nova Canvas, SDXL)
  • • You're embedding text (Titan Embed, Cohere Embed)

Embedding API

Embedding models don't fit the chat format, so they use InvokeModel. Amazon Titan Embed Text v2 and Cohere Embed v3 are the main embedding models on Bedrock. Titan Embed v2 outputs 256, 512, or 1024 dimensions (configurable), supports normalisation, and integrates directly with Bedrock Knowledge Bases.

Checklist: Do You Understand This?

  • What is the difference between the bedrock and bedrock-runtime boto3 clients?
  • When would you use Converse vs InvokeModel?
  • Can you write a minimal Converse API call that sends a user message and prints the response?
  • Why must embedding requests use InvokeModel rather than Converse?