Content Creation
Beyond text, OpenAI provides a suite of tools for generating and processing visual and audio content — from photorealistic images with accurate text rendering, to real-time voice conversations and full HD video generation with Sora.
In This Section
Image Generation
GPT-4o native image generation and the evolution from DALL-E — accurate text, faces, and iterative refinement.
Audio & Voice
Advanced Voice Mode, Whisper STT, new transcription models, TTS options, and the Realtime API for developers.
Sora — Video Generation
Text-to-video and image-to-video with Sora 2 — 1080p HD, up to 25 seconds, physically accurate motion.