Sammy Like
A tiny speech synth in the spirit of 1982 sam · three formants, phase-reset, 4-bit · by gizmo64k
← indiepixel.de
01 · speak

The mouth

VOICE SAMMY PITCH 64 SPEED 72
> PHONEME STREAM
READY
IDLE- FRAMES
Pitch 64
Speed 72
Volume 80

Voice library

Pick a voice, then hit speak.

What should Sammy say?

Type English and Sammy will sound it out.
- phonemes
> Converted phonemes

Make him talk

Enter to speak · Shift+Enter for newline.
Phonemes mode. Two-letter codes (IY, AE, SH…) or single letters (S, T, L…). Append a digit 1-8 after a vowel to stress it.

English mode. Type normal English. Unknown words get a crude guess - shown in hot pink - add them to the dictionary below to fix them permanently.
02 · dictionary

Words Sammy knows

0 total · 0 yours
Turn on English mode above to use this. Then type a sentence and any unknown words will show in hot pink - add proper phoneme spellings here and Sammy learns them. Your edits save automatically in your browser.
Loading…
03 · how it talks

Meet Sammy.
A voice from
three sines.

No neural network. No recorded samples. No filters. Sammy synthesizes speech from scratch, sample by sample, using one of the oldest tricks in computer audio: drive three oscillators at the frequencies the mouth would resonate at, and reset them all every time the vocal cords would snap shut. That reset is what makes him sound like a little robot instead of a pure tone.

01formants

Your vocal tract has resonances - standing waves at specific frequencies. Linguists call these formants. Move your tongue, the resonances move with it, and that's how one sound becomes another.

Sammy fakes them directly. Two sine waves plus one rectangle, tuned to F1, F2 and F3. No filters, no mouth - just the frequencies.

02the buzz

A pure sum of three sines sounds nothing like a voice. It sounds like a synthesizer.

The trick: every time the glottal pulse would fire (around a hundred times a second for a male voice), all three phases are reset to zero. You get a comb of harmonics instead of three isolated tones. That comb is what your ears parse as speech.

03hiss

Consonants like S, F, SH don't use the oscillators at all. They're just colored noise - a cheap bandpass around where the hiss should sit spectrally. Two running averages, that's the whole filter.

Voiced fricatives (V, Z) do both at once: the oscillators keep buzzing and the noise mixes in on top.

04grit

Finally the output is quantized to 4 bits. 16 amplitude levels, no more. That's the staircase you hear in the waveform - the D/A converter of a 1980s home computer, modeled back in.

Without it Sammy sounds like a clinical vocoder. With it, he sounds like home.

05ten ms slices

Each phoneme is a set of parameters - three formants, three amplitudes, a pitch. The engine expands a phoneme into 10-millisecond frames, then linearly interpolates two frames either side of each boundary into the next sound.

Short, crude, mechanical. Smoother interpolation would sound more human. That's exactly why we don't do it.

06no english

The original SAM shipped with a 600-rule English-to-phoneme converter. Sammy doesn't. You write phonemes directly, like HE4LOW instead of "hello". Less convenient, way more fun.

For a chatbot, you phonemize the reply table once. For everything else, you get to spell things how they sound, which is its own small pleasure.

Things to try

  • Stack stress digits. Put 4 after one vowel and 1 after another - the first syllable jumps in pitch and the second stays low. That's how SAM did emphasis. Try IH4NTER EH1STIHNX.
  • Crank the pitch to 180. Sammy becomes something between a frog and a vacuum cleaner. Lower the speed at the same time for maximum doom.
  • Drop the pitch to 25. Chipmunk territory. Pair with the Elf preset for something genuinely cartoon-like.
  • Feed him gibberish. BLARGH SNURK DRAGOB is more fun than actual words because Sammy commits equally to both.
  • Write a phrase phonetically. Say "all your base" as AO4L YOHR BEY4S AHR BIHLAONX TUW AH4S. The more idiosyncratic your spelling, the more character he gets.
  • Download the WAV and drop it into anything. The file is 8-bit-ish, tiny, and sounds like it came off a floppy disk.
04 · phoneme reference
IY
f(ee)t
IH
p(i)n
EH
b(e)d
AE
S(a)m
AA
p(o)t
AH
b(u)t
AO
t(al)k
UH
b(oo)k
UW
l(oo)t
ER
b(ir)d
AX
(a)bout
EY
m(a)de
AY
h(igh)
OY
b(oy)
AW
h(ow)
OW
sl(ow)
L R W Y
liquids / glides
M N NG
nasals
B D G
voiced stops
P T K
stops
V Z DH J
voiced fricatives
F S SH TH
fricatives
CH H
misc
1-8
stress digit after vowel
. , ␣
pauses
05 · use it

Drop it anywhere

// 500 lines, one file, zero dependencies
import { SammyLike } from './sammy.js';

const sammy = new SammyLike();

// speak returns a Promise that resolves when playback ends
await sammy.speak("HAY DHEHR.");
await sammy.speak("AY AEM TAY4NIY.", { pitch: 40 });

// render to WAV (for caching or pre-baking chatbot replies)
const blob = await sammy.renderToWav("DAW4NLOWD MIY4.");

A toy by gizmo64k.
Pure HTML + CSS + JavaScript. No frameworks, no dependencies, no CDN, no training data. Three oscillators, a glottal counter, and a bit of colored noise. The whole thing is a single file.

gizmo64k