Data-Driven Thoughts on "The Foolish Old Man Moves AGI" | 5Y View
The more human effort you put in, the more intelligence you get out.

Author

Steven Shi, Vice President at 5Y Capital
My hypothesis draws inspiration from friends in the industry and Andrej Karpathy. He recently noted on Twitter: "People have too inflated a sense of what it means to 'ask an AI.' The AI are language models trained basically by imitation on data from human labelers. Instead of the mysticism of 'asking an AI,' think of it more as 'asking the average data labeler' on the internet."
AI, or large language models, however much interpretability research or theoretical work tries to reveal so-called emergent intelligence, contains no magic at its core. Whether it's scaling laws, RL, or any dazzling technical buzzword — these are largely decorations of motivation. Strip away the makeup, and what remains are two plain first principles:
-
Obtaining better, more comprehensive data
-
Modeling the data from (1) more efficiently
The next computing paradigm that drives productivity transformation may not be AI with "human-level intelligence," but rather intelligence that "replaces humans in completing work." If we define AGI as the ability to solve 90% of white-collar jobs, we could certainly invest enormous time researching and replicating the technical paths of OpenAI or Anthropic, trying to divine the genius of Ilya and John Schulman, and pushing the boundaries of (2). But the simpler path to productivity transformation may not lie in (2) at all. The most straightforward approach is to unwaveringly pursue (1): directly collecting the broadest possible long-tail data that best captures and generalizes the working habits of humans across every industry, from every conceivable source, for both pretraining and post-training stages, as ground truth. Over a 3–5 year cycle, exhaustively merge all of it into the language model, then pray that intelligence capable of replacing all basic logical human work naturally emerges.
For top-tier LLM teams, the cost of this high-quality vertical-data-driven path to AGI is entirely acceptable:
- Goal: A three-year timeline, covering 400 vertical domains
- Each vertical domain: a small team of 3–5 competent computer science undergraduates, working at one quarter's human efficiency
- Build out the domain's data engineering pipeline, prompt tuning, requirement alignment, etc.
- This level of data investment is basically sufficient to produce a domain-specific SOTA model
- Established workflows can continue accelerating in later stages
- Roughly 30 such undergraduate teams (in a sense, private tutors for AI) are enough to cover 400 vertical domains in about three years
Brute-forcing data at scale can reduce reliance on singular geniuses like Ilya. In 2024, we in fact possess the world's most extravagant and diverse pool of idle intellectual resources. Depending on these resources, the difficult but straight path may be the "fool's shortcut" to AGI: 10x cross-domain expert-labeled high-quality data volume, meticulous and grounded data engineering, 5–10% of the compute investment of frontier models, yielding highly practical intelligent systems.
Data is all you need. If any team is exploring this possible "fool's shortcut to AGI" — exploring vertical domain data generation/labeling, whether in code, operator use, or other domains; whether through algorithmic or management model innovation — feel free to reach me at: stevenshi@5ycap.com
Below is Andrej Karpathy's original content, translated by AI for reference
Author: Andrej Karpathy
Founder of Eureka Labs. Previously, Director of AI at Tesla, founding team member at OpenAI, CS231n/PhD at Stanford.
People have too inflated a sense of what it means to "ask an AI" about something. The AI are language models trained basically by imitation on data from human labelers. Instead of the mysticism of "asking an AI", think of it more as "asking the average data labeler" on the internet.
Few caveats apply because e.g. in many domains (e.g. code, math, creative writing) the companies hire skilled data labelers (so think of it as asking them instead), and this is not 100% true when reinforcement learning is involved, though I have an earlier rant on how RLHF is just barely RL, and "actual RL" is still too early and/or constrained to domains that offer easy reward functions (math etc.).
But roughly speaking (and today), you're not asking some magical AI. You're asking a human data labeler. Whose average essence was lossily distilled into statistical token tumblers that are LLMs. This can still be super useful of course. Post triggered by someone suggesting we ask an AI how to run the government etc. TLDR you're not asking an AI, you're asking some mashup spirit of its average data labeler.
Example when you ask eg "top 10 sights in Amsterdam" or something, some hired data labeler probably saw a similar question at some point, researched it for 20 minutes using Google and Trip Advisor or something, came up with some list of 10, which literally then becomes the correct answer, training the AI to give that answer for that question. If the exact place in question is not in the finetuning training set, the neural net imputes a list of statistically similar vibes based on its knowledge gained from the pretraining stage (language modeling of internet documents).
Q&A
Q: Can RLHF (reinforcement learning from human feedback) create superhuman outcomes?
AK: RLHF is still RL from Human feedback, so I wouldn't say that exactly. RLHF moves the performance to "discriminative human" grade, up from SFT which is at "generative human" grade. But this is not so much "in principle" but more "in practice", because discrimination is easier for an average person than generation (e.g. label which of these 5 poems about X is best vs. write a poem about X). Separately you also get a separate boost from the wisdom of crowds effect, i.e. your LLM performance is not at human level, but at ensemble of human level.
So with RLHF in principle the best you can hope for is to reach a performance where a panel of e.g. the top 10 human experts on some topic, with enough time given, will pick your answer over any other. So in some sense this counts as superhuman.
To go proper superhuman in the way people think about it by default I think, you want to go to RL instead of RLHF, in the style of my earlier post on RLHF is just barely RL:
https://x.com/karpathy/status/1821277264996352246
Q: It doesn't interpolate, does it? If I ask "What color is a Gropy?", and we had 100 labellers say it's blue and 100 labellers say it's yellow, it's going to randomly say blue or yellow — but never "It's a debated question, some say blue, some say yellow". Right?
AK: Excellent question and yes exactly, it responds with blue or yellow with 50% probability. Saying "It's a debated question, some say blue, some say yellow" is just a sequence of tokens that would be super unlikely, it doesn't match the statistics of the training data at all.
Q: It says "it's a debated question" on almost everything that's a debated question. Try it.
AK: The human labelers are instructed in their training documentation to say stuff like that to keep things neutral.
Q: I feel like the instant access to "skilled data labelers" in many domains is such a profound and useful function that we lacked prior to the LLM. We shouldn't take this new found accessibility feature for granted.
AK: 100% great way to put it.


[
](http://mp.weixin.qq.com/s?__biz=MzkwMDI2ODE0OQ==&mid=2247497632&idx=1&sn=a9cbe839d85c75339f70d52282be05f4&chksm=c0441ebaf73397aca4c052a99cb91a9222bc828c2287cb4e72c8c9563ececdc30d5fa4f45684&scene=21#wechat_redirect)[
](https://mp.weixin.qq.com/s?__biz=MzkwMDI2ODE0OQ==&mid=2247501793&idx=1&sn=a954e999f1d9bf92189ed151db6b0c02&scene=21#wechat_redirect) [

5Y Capital seeks out, supports, and inspires lonely entrepreneurs, providing them with support from the spiritual to all operational matters. We believe that if the world begins to believe in the "crazy" you that others see, the world will become a different place.
BEIJING·SHANGHAI·SHENZHEN·HONGKONG
