Larry Voice Catalog

Model: gemini-3.1-flash-tts-preview
Date: 2026-04-16 · Phrase: "Hi Igor. This is Larry, testing the Gemini voice catalog at regular pacing. Let me walk you through a longer sample so you can hear how this voice handles prosody, pauses, and sentence variety…"
Audio: 24 kHz mono · Pricing: ~$0.12 per 30 s utterance · Rate limit: 10 rpm (preview)
Details: How was this generated? → · Soprano iteration loop →
Repo: github.com/idvorkin-ai-tools/larry-voice-samples

ℹ️ These samples are pre-generated, not live. Each MP3 was produced once via the Gemini 3.1 Flash TTS preview endpoint and committed to this repo. The speed slider above uses the browser's HTMLMediaElement.playbackRate — it doesn't call the API. This keeps the page free and instant.

Custom voice presets

Charon base voice + natural-language style directive.

Playback speed: 1.00×

live — playing audio keeps playing at the new rate

Larry (plain Charon)

No style directive — baseline

Freud preset

Thoughtful Viennese baritone — clinical, measured

Soprano preset v2 · 6/10

Enceladus base · tournament-tuned prompt · Raspy North-Jersey, quiet menace. How we got this voice →

Catalog sweep — 17 working Gemini voices (click to expand)

Each voice ran the same phrase. Slider above affects all of them.

Charon ⭐

Larry default · baritone storyteller

Fenrir

Excitable baritone · widest dynamic

Orus

"Lively male"

Puck

Upbeat, playful male

Enceladus

Male-coded, neutral

Iapetus

Neutral-to-cool

Algenib

Neutral

Algieba

Neutral

Umbriel

Neutral

Autonoe

Neutral

Zephyr

Airy

Aoede

"Warm"

Leda

Neutral-warm

Callirrhoe

Fastest generation

Despina

Smooth

Erinome

Neutral

Kore

"Firm" · reads feminine

Not available on preview endpoint — 13 voices (click to expand)

These names appear in Google's documented catalog but return silent failures on gemini-3.1-flash-tts-preview as of 2026-04-16: Achernar, Achird, Alnilam, Gacrux, Laomedeia, Pulcherrima, Rasalgethi, Sadachbia, Sadaltager, Schedar, Sulafat, Vindemiatrix, Zubenelgenubi.

Stats: latency & cost

Catalog reference

Cost per minute of output is fixed ($0.24 = 24 000 audio tokens × $10/1M). Wall clock scales ≈ linearly with audio length — voice choice shifts it by ±10% only.

Audio length	Wall clock (typical)	Cost
5 s utterance	~5 s (range 4.2–6.8)	$0.02
30 s utterance	~14 s (range 11–18)	$0.12
1 min utterance	~60 s	$0.24
10 min long-form	~10 min	$2.40

Text input at ~150 chars/min is $0.00002 (negligible). Preview tier; may shift at GA.

Custom preset latency (measured)

Style directives add zero API-side overhead — Gemini reads the prose as metadata, not speech. Styled output is typically 15–25% longer because the model adds persona pauses/beats. Same phrase, different prompts:

Preset	Wall clock	Audio length	Cost
Plain Charon	4.7 s	6.4 s	$0.026
+ Soprano style	4.7 s	7.8 s	$0.031
+ Freud style (diff phrase)	~5 s	17.3 s	~$0.07

Takeaway: style presets cost ~20% more per utterance — the extra is persona quality, not overhead.

Generated 2026-04-16 · regenerated 2026-04-15 at ~30 s per clip · How was this generated? →