Content Creation

Beyond text, OpenAI provides a suite of tools for generating and processing visual and audio content — from photorealistic images with accurate text rendering, to real-time voice conversations and full HD video generation with Sora.

In This Section

Image Generation

GPT-4o native image generation and the evolution from DALL-E — accurate text, faces, and iterative refinement.

Audio & Voice

Advanced Voice Mode, Whisper STT, new transcription models, TTS options, and the Realtime API for developers.

Sora — Video Generation

Text-to-video and image-to-video with Sora 2 — 1080p HD, up to 25 seconds, physically accurate motion.