Standout features
Cartesia builds the fastest real-time voice — its Sonic TTS, built on State Space Models, hits sub-100ms latency for natural, expressive voice agents in 40+ languages.
Worldwide search interest, indexed 0–100 · Google Trends.
Cartesia is a voice-AI lab built for speed — its Sonic TTS, on State Space Models, delivers sub-100ms latency for real-time voice agents, plus Ink streaming transcription.
- San Francisco; spun out of the Stanford AI Lab (2023).
- Built on State Space Models — fast + efficient.
- Sonic TTS: sub-100ms, 40+ languages, expressive.
- Used by Quora, Yelp, DoorDash, ServiceNow; on Together AI.
Cartesia is latency-first.
- Sonic TTS — lifelike, low-latency speech.
- State Space Model architecture (efficient, fast).
- 40+ languages + voice cloning.
- Ink: streaming speech-to-text with turn detection.
Freemium, usage-based.
Cartesia fits real-time builders.
- Voice agents in support, healthcare, banking.
- Developers needing the lowest latency.
- High-volume live conversation systems.
- Creators wanting a big ready-made voice catalog.
- Pure offline / batch narration needs.
No tool is perfect — the trade-offs to weigh:
- Developer-first — not a polished creator app.
- Smaller voice library than the leaders.
- Newer brand vs incumbents.
- Usage pricing needs monitoring at scale.
- ✓Sub-100ms real-time latency
- ✓Efficient State Space Model tech
- ✓40+ languages + cloning
- ✓Trusted by major apps
- ✓Ink streaming STT
- ✕Developer-first, not a creator app
- ✕Smaller voice library
- ✕Newer brand
- ✕Usage pricing to monitor
Teams building real-time voice agents praise Cartesia for sub-100ms latency and natural pacing, often switching from incumbents for the speed and support. The gripes are that it’s developer-first rather than a polished creator app, with a smaller voice library. Sentiment is positive among latency-sensitive builders.
Cartesia is a San Francisco voice-AI lab spun out of the Stanford AI Lab.
Company figures are drawn from public disclosures and reputable trackers (gathered Jun 2026). User and revenue numbers are estimates and move fast.
Pick up to two other coding tools to see them head-to-head on the same rubric.