Gradients Are a Powerful Mental Model | The Counting Horse

五源资本五源资本·September 6, 2022

How to build smooth gradients is crucial for an antifragile system.

The Counting Horse

We don't know whether

we are a horse that can count

or a mathematician,

though most of the time it's the former.

A column by 5Y Capital partner Zhang Fei,

updated occasionally,

on cognition and investing.

Zhang Fei

5Y Capital Partner

Gradient Is a Powerful Mental Tool

Gradient is a fascinating and powerful concept that I've come to understand more deeply over the past year. In mathematics, a gradient is a vector that indicates the direction of the steepest directional derivative of a function at a given point. Its magnitude represents how steep the slope is.

In practice, this concept carries significant weight in modern technologies — artificial intelligence, life sciences, and complex social systems, to name a few.

Tennis offers a great example. A tennis ball moves and spins through three-dimensional space. A trained professional can hit it at 200 kilometers per hour with 3,000 revolutions per minute. That's an extraordinary skill. Our brains don't run mathematical formulas to calculate the ball's precise position — they actually use neural networks to simulate how the ball moves.

A neural network is a multi-layered vector network, with each layer employing gradient descent to reduce computational complexity. With thousands of neural network layers, the brain can build a model that matches the ball's movement patterns perfectly. Gradient is how the brain decodes high-dimensional, complex problems — especially those in the physical world. Riding a bicycle is another example: the brain uses gradients and neural networks to simultaneously control balance and motion.

AI works similarly to the human brain. In AI, gradient is a numerical computation that tells us how to adjust the network's parameters to minimize output error. Gradient descent is the most commonly used algorithm for training neural networks.

Take Go as an example. The number of possible board configurations is 10 to the power of 172, far beyond current computational capacity. DeepMind's AlphaGo used gradients and neural networks to dramatically reduce this complexity and defeat the world champion.

Gradient also plays a critical role in the origin of life. In his book The Vital Question, Nick Lane tells the story of Nobel laureate Peter Mitchell, who won the 1978 Nobel Prize in Chemistry for his chemiosmotic theory. Peter discovered that the source of all cellular energy is proton gradients driving ATP (adenosine triphosphate) synthesis. Protons diffuse from areas of high concentration to areas of low concentration to produce ATP. The proton gradient powers respiration — it's as fundamental to life science as the genetic code. Chemiosmotic theory is among the most counterintuitive discoveries in biology; it took a long time to be accepted, and science now understands the precise atomic-level mechanisms of how proton gradients operate.

In social systems, how do we find the optimal distribution of wealth and power among individuals to create a more efficient, dynamic society? Social scientists invented the "Gini coefficient" as a metric to monitor income distribution fairness. But the Gini coefficient doesn't tell us about the health of different social strata. I believe "gradient" could be a powerful tool for assessing the antifragility of different social classes. This approach applies equally to social networks and other ecosystems. In nature, different gradients produce very different ecologies. We can compare three common water systems: waterfalls, rivers, and crater lakes. These three systems represent gradients from high to low.

It's easy to see that rivers (gentle gradient descent) have far better ecology than waterfalls (massive gradient descent) and crater lakes (zero gradient, depth aside). Nature is showing us an optimization solution refined over hundreds of millions of years of evolution, yet most people overlook the significance of this phenomenon.

Social networks have long struggled to avoid two phenomena:

  1. Supernodes dominating and controlling the network, leading to UGC (user-generated content) collapse;
  2. The network becoming too flat (with no large nodes at all), resulting in noise overload;

In both cases, gradients don't function well. How to build smooth gradients is crucial for an antifragile system.

In the current Web3 frenzy, many founders and investors claim that Web3 will build a fully autonomous, distributed system. I may be biased here. But broadly speaking, I believe a completely gradient-free distributed system violates the laws of physics, and will therefore be inefficient and unstable. As Moxie Marlinspike, founder of Signal, noted in his essay "My First Impressions of Web3," the Web3 ecosystem is already dominated by a few supernodes. What's likely to happen is that revolutionaries, armed with new concepts and tools, end up creating a new world with even worse gradients. This revolutionary paradigm has failed repeatedly throughout history in various utopian experiments. Now, let's wait and see how Web3 evolves going forward.


Leave a Comment, Win a Gift

Share your thoughts, perspectives, and opinions in the comments. We'll select one featured comment to receive a 5Y Capital commemorative T-shirt. (Comments close September 13. Please reply with your shipping information within 24 hours of receiving our message.)

5Y Capital seeks out, supports, and inspires lonely entrepreneurs, providing everything from spiritual backing to full operational support. We believe that if the "crazy" you — in others' eyes — starts to be believed in, the world will become a different place.

BEIJING · SHANGHAI · SHENZHEN · HONG KONG

WWW.5YCAP.COM