Supertonic 3: Lightning-Fast, On-Device, Multilingual TTS.
A 99M-parameter open-weight text-to-speech model running locally on CPU via ONNX Runtime. No GPU. No cloud. No API.
- 31 Languages
- 99M Params
- CPU Only
- ONNX Runtime
- OpenRAIL-M
Open weights. Runs in your browser or fully offline on your machine.
Bring your own voice and hear it speak 31 languages.
Currently on Supertonic 2 — v3 rolling out soon.
Speaks 31 languages
One 99M-parameter model. No per-language fine-tuning. No GPU.
Highlighted languages have audio samples below.
Hear it next to the giants.
Same input text, same reference voice prompt, three systems. Supertonic 3 is ours — 99M params on CPU. OmniVoice and Chatterbox Multilingual are 5–8× larger and run on a GPU.
No samples match those filters.
Want to hear your own voice?
Supertonic supports zero-shot voice cloning in Voice Builder — record or upload a short reference and synthesize across 31 languages.
GPU-class speed without a GPU.
RTF (real-time factor) measures how long synthesis takes per second of audio — lower is faster. ×RT is the inverse. Supertonic 3 reaches parity with an 800M-parameter GPU baseline while running on a 16-thread CPU.
N = 30 · same machine, same text, same reference voices| Model | Hardware | Params | N | Synth | Audio | RTF ↓ | ×RT ↑ |
|---|---|---|---|---|---|---|---|
| Supertonic 3 | CPU (16 threads) | 99M | 30 | 57.99 s | 289.92 s | 0.200 | 5.00× |
| OmniVoice | RTX 3090 | 800M | 30 | 53.90 s | 275.17 s | 0.196 | 5.11× |
| Chatterbox Multilingual | RTX 3090 | 500M | 30 | 199.70 s | 252.68 s | 0.790 | 1.27× |
Synthesis throughput (×RT, higher is better)
Seconds of speech produced per second of wall-clock time, across the same 30 inputs.
Methodology
- N = 30 samples (same set published in
./samples/on this page). - Mean audio duration ≈ 9.66 s per sample.
- Single machine. Identical text, identical reference voice prompts across all three systems.
- Supertonic 3 timed on CPU with 16 threads via ONNX Runtime. Baselines timed on a single RTX 3090.
- CPU model: (to be filled in).
- RTF = synthesis time ÷ audio duration. ×RT = 1 ÷ RTF.
Drop it into your stack.
Officially supported runtimes. Each tab links to working examples in the upstream repo.
# pip install supertonic
from supertonic import TTS
tts = TTS(auto_download=True)
# 1) Default: synthesize English with voice "M1"
style = tts.get_voice_style(voice_name="M1")
wav, duration = tts.synthesize(
"A gentle breeze moved through the open window.",
voice_style=style,
lang="en",
)
tts.save_audio(wav, "output.wav")
# 2) Swap the voice → "M2"
style = tts.get_voice_style(voice_name="M2")
# 3) Swap the language → Japanese
wav, _ = tts.synthesize("こんにちは、世界。", voice_style=style, lang="ja")
Full reference and example scripts: supertonic-py docs.
// npm install @supertone/supertonic
import { TTS } from "@supertone/supertonic";
const tts = await TTS.load({ autoDownload: true });
const style = await tts.getVoiceStyle("M1");
const { wav } = await tts.synthesize("Hello from Node.", { style, lang: "en" });
See the node/ folder in the upstream repo.
// runs in browsers via onnxruntime-web
import { TTS } from "@supertone/supertonic-web";
const tts = await TTS.load();
const { wav } = await tts.synthesize("Hello from the browser.", { lang: "en" });
See the web/ folder in the upstream repo.
// Swift Package Manager: github.com/supertone-inc/supertonic-swift
import Supertonic
let tts = try Supertonic.TTS(autoDownload: true)
let wav = try tts.synthesize("Hello from iOS.", lang: "en")
See the ios/ folder in the upstream repo.
// Gradle: implementation("ai.supertone:supertonic-android:3.+")
val tts = Supertonic.TTS(context, autoDownload = true)
val wav = tts.synthesize("Hello from Android.", lang = "en")
See the android/ folder in the upstream repo.
// CMake: find_package(Supertonic CONFIG REQUIRED)
#include <supertonic/tts.hpp>
auto tts = supertonic::TTS::create({ .auto_download = true });
auto wav = tts->synthesize("Hello from C++.", { .lang = "en" });
See the cpp/ folder in the upstream repo.
Open weights. Permissive code. Read the fine print.
Model weights OpenRAIL-M
The trained Supertonic 3 model is released under the OpenRAIL-M license. Weights are open and usable commercially, with use-based restrictions (no harm, no impersonation without consent) and an attribution requirement.
Note: OpenRAIL-M is not equivalent to MIT — it imposes downstream use restrictions. Read the full license text before deploying.
Sample code MIT
The Python package, runtime bindings, and example code in the upstream repo are MIT-licensed. Use, modify, and redistribute freely with attribution.
Standard MIT terms: no warranty, attribution required, no restrictions on commercial use.