Vox per gradus. voice through layers · the audio stack from mathews to mann
MATHEWS · 1957 · CHOWNING · 1973 · MANN · 2014
A voice in synthesizer terminology is one playable instance of a sound — a note assigned to one of the engine's polyphonic slots. The three layers of the modern audio stack each contribute to that voice: a controller chooses the pitch and dynamics; a synth engine assigns the voice and shapes its envelope; a per-sample kernel writes the audio buffer. Each layer was once a separate machine. The browser is now all three.
In 1957, at Bell Telephone Laboratories, Max Mathews ran a program called Music I on an IBM 704 mainframe. The program produced a seventeen-second monophonic melody, played back from punched tape through a vacuum-tube amplifier into a single speaker. It was the first sound a computer had ever generated under software control. The following year, Mathews shipped Music II; by 1968 he had reached Music V, and the entire field — from Stanford's CCRMA to IRCAM in Paris — was building on its abstractions.
The abstractions that survived are still here. The unit generator — a software module that produces or transforms an audio signal, connected to other unit generators in a graph. The orchestra-and-score separation — instruments are defined once, played many times. The oscillator, envelope, filter, mixer as the four atoms of any synthesizer. Mathews wrote the grammar. Sixty-six years later, every browser-tab synth on the open web is, at the architectural layer, a Music V program with better fonts.
This plate concerns the layers that arrived along the way and the lineage they share. The voice is what each layer assigns. The controller layer (WebMIDI, hardware keyboards) plays a voice. The synth-engine layer (Tone.js, Music V, Csound) assigns a voice — picks one of the polyphonic slots and routes a note through it. The kernel layer (AudioWorklet, the unit generator) writes the voice's actual samples. Vox per gradus. One voice, three layers, sixty-six years.
Three lives across three eras of computer music — Bell Labs, Stanford, the open web. Each one shifted the layer of the stack their generation worked at. The voice they each shaped sounds different; the discipline is the same.
FATHER OF COMPUTER MUSICUNIT GENERATOR
Mathews was an electrical engineer at Bell Labs working on signal-processing research when he ran Music I on the IBM 704 in 1957. The seventeen-second melody is sometimes called the founding moment of computer music — a generation of composers, including Pierre Boulez, John Chowning, James Tenney, and Laurie Spiegel, traveled to Bell Labs to use his programs and his abstractions. His invention of the unit generator — the modular, composable building block of every later synthesizer — is the abstraction this entire plate is built on. He continued working on computer-music systems for fifty-four years; he died in 2011, the year Tone.js's earliest precursor was being prototyped on a different coast.
FM SYNTHESISCCRMA
In 1967, while working with Mathews's Music IV at Stanford, John Chowning discovered that an oscillator's frequency could be modulated by another oscillator at audio rate to produce complex spectra from very simple operators. He published the result in 1973. Stanford patented it. In 1983, Yamaha licensed the patent and shipped the DX7 — the first commercial synthesizer that didn't sound like analog circuitry. Chowning's FM is the bell timbre in specimen 047 · the étude; it is the timbre of every glass-and-mallet sound in 1980s pop production. One algorithm — fifty years later, every browser PolySynth ships an FM oscillator option.
TONE.JSWEB AUDIO
Yotam Mann released the first version of Tone.js in 2014, the year after the W3C Web Audio API became available in Chrome and Firefox. The library wraps the Web Audio API in the orchestra-and-score abstractions Mathews wrote sixty years earlier — Tone.Synth, Tone.Sequence, Tone.Transport, Tone.PolySynth — and assumes the modern Web Audio graph as its substrate. It is now the de-facto framework for browser-based music. Mann later joined Google's Magenta team, working on AI-augmented composition; Tone.js continues under community maintenance. Specimens 004 · The Pulse, 046 · The Score, and 047 · The Étude all sit on Tone.js. The voice that runs in your browser tab now is Mathews's voice — through Mann's wrapper, on Adenot's kernel, dispatched by Wilson's controller protocol.
Three practitioners, three machines, three layers of the same stack. Mathews wrote the grammar. Chowning wrote a timbre. Mann wrote the library that puts both in your browser. The voice that arrives at your speakers in 2026 carries every one of them. Vox per gradus.
What changed between 1957 and 2026 was not the abstraction. The abstraction is sixty-six years old. What changed is what the abstraction is made of. Mathews's Music I needed a mainframe and a roomful of operators. Csound on a UNIX workstation in 1986 needed a graduate student and a pile of compile flags. Tone.js in a browser tab in 2026 needs a click. The substrate caught up to the grammar.
The shift happened in three layers. Hardware reached the point where the per-sample kernel — the inner loop of the unit generator — could run inside a browser without dropping samples; AudioWorklet ratified that capability in 2018. Standards reached the point where Web Audio became a W3C recommendation in 2015 and stable in every major browser by 2017. Distribution reached the point where Tone.js could ship as one CDN script and run on the same laptop the listener was already using. The voice that took an IBM 704 and a roomful of vacuum tubes in 1957 now runs in a tab.
The cleanest illustration is the étude in specimen 047. It would have taken Mathews several days of mainframe time to render twenty-eight seconds of fifteen-voice polyphony in 1957. It runs in real time, in your browser, with reverb, three timbres, stereo panning, and live spectrum visualization, with a single click. Same grammar. New substrate. The voice arrived.
Plate IX argued that the right answer is often older than the conditions in which it can be used. This plate argues an instance of that.
The right abstractions for computer music — the orchestra and the score, the unit generator, the oscillator and the envelope — were written by Max Mathews in a Bell Labs lab between 1957 and 1968. For sixty years, the substrate was insufficient. Mainframes were too slow, then workstations were too rare, then audio drivers too quirky. The composers who used Mathews's grammar — Chowning, Boulez, Spiegel, Tenney — worked at institutions with budgets the rest of the field could not match. The voice ran, but only in special rooms.
Then in fifteen years between 2011 and 2026, three things happened. The W3C ratified a Web Audio API. Yotam Mann wrapped it in Mathews's grammar. AudioWorklet shipped the per-sample kernel as a browser primitive. The voice, which had been running in special rooms for sixty years, arrived in every browser tab. The grammar was correct from the first try. The substrate finally caught up.