Standout features
Fish Audio is an open-source TTS project — its Fish Speech S2 model tops independent benchmarks, with zero-shot voice cloning from a few seconds, free-form natural-language emotion control and 80+ languages.
Worldwide search interest, indexed 0–100 · Google Trends.
Fish Audio is an open-source TTS that punches above its weight — Fish Speech S2 tops independent leaderboards while shipping open weights and a cheap hosted tier.
- Chinese AI audio startup; 26,000+ GitHub stars.
- S2 Pro (Mar 2026): #1 EmergentTTS-Eval (81.88%), tops Audio Turing Test.
- Zero-shot cloning from 3–10s; 80+ languages, cross-lingual.
- Free-form inline emotion control — natural-language prosody, no fixed tag set.
Fish Audio is open + benchmark-leading.
- Open-weight S2 Pro — weights, training + inference code.
- Zero-shot voice cloning, cross-lingual transfer.
- Multi-speaker, multi-turn dialogue in one pass.
- Hosted API + self-host; sub-100–150ms latency.
Free + cheap hosted tiers (or self-host).
Fish Audio fits developers + tinkerers.
- Devs wanting SOTA quality without per-minute fees.
- Self-hosting for privacy / cost control.
- Multilingual cloning + dialogue generation.
- Non-technical creators wanting a polished app.
- Teams needing enterprise SLAs + support.
No tool is perfect — the trade-offs to weigh:
- Self-host needs a GPU + setup.
- No hand-holding / enterprise SLAs.
- Younger ecosystem vs incumbents.
- Cloning raises the usual consent concerns.
- ✓Open weights, SOTA benchmarks
- ✓Zero-shot cloning (3–10s)
- ✓Free-form emotion control
- ✓80+ languages
- ✓Free self-host or cheap API
- ✕Self-host needs a GPU
- ✕No enterprise SLA / support
- ✕Younger ecosystem
- ✕Cloning consent concerns
Developers are buzzing about Fish Audio — S2 Pro topping benchmarks against ElevenLabs and OpenAI while shipping open weights and cheap hosting, with strong zero-shot cloning. The gripes are needing a GPU to self-host and no enterprise hand-holding. Sentiment is very positive among technical users.
Fish Audio is an open-source text-to-speech project from a Chinese AI audio startup.
Company figures are drawn from public disclosures and reputable trackers (gathered Jun 2026). User and revenue numbers are estimates and move fast.
Pick up to two other coding tools to see them head-to-head on the same rubric.