Voice input will replace typing on a keyboard.

葬AI·June 18, 2025

Articulate and eloquent, with exceptionally graceful prose.

"Speak and It Writes"

Here's a question: how long have humans been typing on keyboards?

The answer: 150 years.

In 1871, an American named Christopher Sholes built the world's first practical typewriter.

Against the backdrop of the Second Industrial Revolution, cast iron smelting, stamping processes, and new technologies like ink ribbons and rubber collectively shaped the physical form of the typewriter. Meanwhile, railroads and telegraphs extended commercial networks across the United States. Demand for transmitting business information — contracts, invoices, quotes — exploded.

Handwriting was too slow, at a normal pace of only 20-30 words per minute. The market desperately needed a more efficient information input tool, and so the first commercially successful typewriter was born.

To this day, our primary method of information input remains a direct descendant of that typewriter: the keyboard. Even the QWERTY layout is a 150-year-old design.

But is keyboard input still suitable for our era?

I don't think so. Typing is too slow for information input.

Voice input combined with AI is more efficient and will replace keyboard interaction.

More importantly, spoken expression naturally aligns better with human thinking patterns.

Before typing out a sentence, the brain must first formulate the complete thought, then input it character by character. This is a process of "crystallizing" one's thoughts into written language. But this isn't how we normally speak or think.

Most people in the world can't write a few-thousand-word essay, yet everyone can chat and communicate fluently. If someone can express themselves clearly orally, they should theoretically be able to write well. So where's the problem?

The problem lies in that "crystallization" process — it's difficult.

Whether handwriting or typing, there's a complex interaction where the brain commands the hands, which then execute.

Speaking, by contrast, is largely subconscious. Take me eating shui pen yangrou (a lamb soup dish from Shaanxi) — most people couldn't write a thousand-word article about it.

But if you ask me whether shui pen yangrou is good, I can immediately answer: it's delicious, the lamb broth is fresh, the meat is tender, the chili oil is sour and spicy, the crescent-shaped flatbread has that freshly-baked wheat flour aroma. These words come naturally to my lips, no deliberate thinking required.

Shui pen yangrou really is delicious

Even for complex questions — ask me what human nature is, and I can blurt out "human nature is eating, drinking, shitting, and pissing" or "human nature is existence." These one-liners require no thought either.

These off-the-cuff sentences are scattered and unstructured, nothing like the organized logic of formal writing.

Voice input plus AI solves this problem perfectly. We can speak stream-of-consciousness, then let AI handle the structuring and logical flow.

Spoken language is more natural, more primal, closer to our authentic state of thinking. Voice input dramatically reduces the cognitive burden of "crystallizing" thoughts in our minds.

In recent years, one trend has become unmistakably clear: people worldwide are no longer enamored with technically complex, elaborately packaged high-production content, and increasingly prefer podcasts, short videos, and social media.

We are undergoing a societal shift in expression from written to spoken language.

Why?

Because much high-production written content is saturated with clichés.

Take films, feature articles, serious literature — they accumulate too many creators' formulas and unspoken rules. A classic film may be 80% formulaic in its narrative structure and cinematography.

Short videos and short dramas, meanwhile, are full of unexpected sparks. What film director could conceive a plot like "the domineering CEO falls for me, a menopausal woman"? They couldn't. Yet this kind of short video is more vivid, more unvarnished, closer to raw human imagination.

Podcasts work the same way. Celebrities can posture as profound in written interviews, but spoken language forces accessibility. Especially on podcasts, where conversations stretch past an hour, listeners can easily tell whether a guest is being genuine.

On the information consumption side, people increasingly favor concise, direct content. This trend toward oral expression in output naturally drives demand on the input side for more efficient input methods, accelerating voice's replacement of keyboards. People no longer need to write in cumbersome ways — they can simply speak.

This actually resembles a leader with a secretarial staff.

A mayor preparing to speak at tomorrow's environmental mobilization meeting obviously won't stay up late typing his speech word for word. He'll orally convey his core points to his secretary — maybe five or six minutes of voice input. The secretary then functions like a large language model, drawing on their interaction history (context), discerning the leader's stylistic preferences, checking the latest documents and reports (web search), then staying up to draft it. The leader offers revision notes.

This is no different from how we use AI. Before AI existed, leaders were already using humans as AI.

Voice input better conforms to natural human thinking and oral expression habits, bypassing the cognitive barrier of "crystallizing" thoughts into written text, transforming "can speak but can't write" into "can speak, therefore can write."

So I believe voice input is superior to keyboard interaction.

Technically, speech recognition is already mature, as is LLM structuring of voice transcripts.

In efficiency, voice far surpasses keyboards. Handwriting: roughly 25-35 characters per minute. Keyboard: 60-90 characters per minute. Normal speaking pace: 200-250 characters per minute. Voice input speed dramatically exceeds typing.

In terms of demand, people need more direct, unadorned expression, and content consumption is shifting toward oral, authentic forms.

All of this points to a clear future: voice input will replace keyboard interaction.

This is a transformation in information interaction driven by technological progress.

Just as stamping processes, rubber, and ink ribbons gave rise to the typewriter, the speech recognition and LLMs developed through keyboard-based programming have produced a new information interaction method — voice input, AI-structured.

(Images in this article generated by ChatGPT o3, with writing assistance from Grimo and Gemini 2.5 Pro.)