Nobel Laureate Geoffrey Hinton: My Fifty-Year Deep Learning Career and Research Philosophy

真格基金·October 10, 2024

Almost all progress in the field of neural networks stems from intuition that strikes in a flash.

Z Talk is ZhenFund's column for sharing ideas and perspectives.

The Royal Swedish Academy of Sciences announced that the 2024 Nobel Prize in Physics has been awarded to American scientist John J. Hopfield and Canadian scientist Geoffrey E. Hinton "for foundational discoveries and inventions that enable machine learning with artificial neural networks." Hinton thus became the only scientist in history to have won both the Turing Award and the Nobel Prize in Physics.

After the announcement, many scholars joked that "the Nobel Prize in Physics is stealing the Turing Award's lunch." Hinton himself said frankly in a phone interview: "I have no idea that will happen. I'm currently staying in a cheap motel in California with terrible internet and phone service. I was supposed to get an MRI scan today, but I had to cancel it!"

In a 2022 episode of The Robot Brains Podcast hosted by UC Berkeley professor Pieter Abbeel, Hinton candidly shared his academic career, the future of deep learning, his research philosophy, and the inside story of the DNN-research auction.

The following is his account, originally compiled by the OneFlow community and edited by ZhenFund.

Geoffrey Hinton

He never formally took a computer course. As an undergraduate at Cambridge, he studied physiology and physics, briefly switched to philosophy, but ultimately graduated with a bachelor's degree in psychology. He once dropped out to become a carpenter due to burnout, but after that setback, he returned to the University of Edinburgh and earned a PhD in the then-obscure field of artificial intelligence. His weak math skills often left him despairing during research, and even after becoming a professor, he would regularly turn to his own graduate students for help with neuroscience and computational science concepts he didn't understand.

His academic path seemed stumbling, yet nearly half a century later, he would be hailed as the "Godfather of Deep Learning." In 2018, Geoffrey Hinton received the Turing Award, the highest honor in computer science, for his foundational contributions to deep learning.

In October 2024, with the announcement of the Nobel Prize in Physics, Hinton became the only scientist in human history to have won both the Turing Award and the Nobel Prize in Physics.

Hinton was born into British scientific aristocracy, yet his life has been extraordinarily rich and full of twists. His father, Howard Everest Hinton, was a British entomologist; his mother, Margaret, was a teacher. Both were communists.

His uncle was the renowned economist Colin Clark, who coined the term "Gross National Product." His great-great-grandfather was the celebrated logician George Boole, whose invention of Boolean algebra laid the foundation for modern computer science.

Immersed in this weighty scientific lineage, Hinton developed independent thinking and resilience from an early age, along with the burden of upholding family honor. His mother gave him two choices: "Either become an academic, or be a failure." He had no reason to choose complacency. Despite various detours in university, he completed his studies.

In 1973, at the University of Edinburgh, he began his PhD in artificial intelligence under Christopher Longuet-Higgins. At that time, almost no one believed in neural networks, and his advisor urged him to abandon the approach. The surrounding skepticism was insufficient to shake his firm belief in neural networks. Over the following decade, he successively proposed backpropagation and the Boltzmann machine, though he would have to wait decades more for deep learning's explosive breakthrough, when his research would finally become widely known.

After completing his doctorate, Hinton's life remained marked by hardship. He and his second wife, Rosalind Zalin (a molecular biologist), moved to the United States, where he secured a faculty position at Carnegie Mellon University. Dissatisfied with the Reagan administration and facing a landscape where AI research was primarily funded by the U.S. Department of Defense, they relocated to Canada in 1987. Hinton began teaching at the University of Toronto's Department of Computer Science and launched the machine and brain learning program at the Canadian Institute for Advanced Research (CIFAR).

In 1994, his wife Rosalind Zalin died of ovarian cancer. Hinton was left to raise their two adopted young children alone; his son also had attention deficit hyperactivity disorder and other learning disabilities. He later remarried to his third wife, Jackie (an art historian), but tragedy struck again when Jackie also died of cancer.

Hinton himself suffers from severe lumbar spine disease, which prevents him from sitting like a normal person. He must work standing up for most of his daily life. Consequently, he avoids flying, since takeoff and landing require remaining seated, which has limited his ability to travel for academic presentations.

From left to right: Ilya Sutskever, Alex Krizhevsky, and Geoffrey Hinton

After nearly half a century of technical perseverance and personal hardship, dawn finally broke in 2012. Together with students Alex Krizhevsky and Ilya Sutskever, he proposed AlexNet, which shook the industry, reshaped computer vision, and launched a new golden age of deep learning.

Also at the end of 2012, he and these two students founded the trio company DNN-research, selling it to Google for $44 million. He transitioned from academic to Google Vice President and Engineering Fellow. In May 2023, Hinton left Google "so that I could speak freely about the risks of AI."

Weathered by countless storms, the "Godfather of Deep Learning," now approaching 77, remains on the front lines of AI research. He does not shy away from skepticism from other scholars, and openly acknowledges judgments and predictions that failed to materialize. Regardless, he continues to believe that a decade into deep learning's rise, this technology will continue to unleash its power, while he himself searches for the next breakthrough.

01 The 1970s:

A "Lone Warrior" Studying Neural Networks

Hinton at age 8

What influenced me most deeply was the education I received in childhood. My family had no religious beliefs. My father was a communist, but considering the better science education at private schools, he insisted on sending me to an expensive Christian private school at age seven. Except for me, all the children there believed in God.

As soon as I came home, my family would say religion was all nonsense. Of course, perhaps because I had strong self-awareness, I didn't believe it either. I realized that believing in God was wrong, and developed a habit of questioning others. Indeed, years later, they also discovered that their original beliefs were wrong, and realized that God might not actually exist.

However, if I were to tell you now to have faith, that faith is important, it might sound ironic, but we do need to have faith in scientific research, so that even when others say you are wrong, you can persist on the right path.

My educational background was quite rich. During my first year at Cambridge, I was the only student studying both physics and physiology simultaneously, which laid a certain foundation in science and engineering for my later research career.

However, I wasn't good at math, so I had to give up physics. But I was also curious about the meaning of life, so I switched to philosophy. After achieving some results there, I began studying psychology.

My final year at Cambridge was difficult and unhappy, so as soon as I finished my exams, I dropped out to become a carpenter. Actually, compared to doing other things, I preferred being a carpenter.

In high school, after daytime classes, I would come home and do woodworking—that was my happiest time. Gradually, I became a carpenter, but after about six months, I found that carpentry paid too little to survive on, though carpentry requires far more than what meets the eye. Renovation work was much easier and paid better, so while working as a carpenter, I also took on renovation jobs on the side. Unless you're a master carpenter, carpentry certainly pays less than renovation.

Until one day, I encountered a truly outstanding carpenter and realized I wasn't suited for this trade. A coal company had asked this carpenter to make a door for a damp, dark basement. Given the special environment, he arranged the wood in reverse directions to counteract deformation from moisture-induced expansion—a method I had never considered. He could also cut a piece of wood square using only a handsaw. He explained to me: if you want to cut wood square, you must align the saw bench and the wood with the room.

At that moment, I felt so far behind him in skill that I thought, perhaps I should go back to school and research artificial intelligence instead.

Later, I went to the University of Edinburgh to pursue a PhD in neural networks, under the renowned Professor Christopher Longuet-Higgins. In his thirties, he had figured out the structure of boranes and nearly won the Nobel Prize for it—truly remarkable. To this day, I still don't fully understand what he researched, only that it was related to quantum mechanics, with the factual basis that "the rotation of the identity operator is not 360 degrees, but 720 degrees."

He had once been very interested in the relationship between neural networks and holograms, but by the time I arrived at Edinburgh, he had suddenly lost interest in neural networks. Mainly, he had read Winograd's (the American computer scientist) paper and been completely convinced that neural networks had no future prospects, and that one should instead pursue symbolic AI. That paper had quite an impact on him.

In fact, he didn't agree with my research direction and wanted me to work on something more likely to win awards, but he was a decent person. He still told me to stay firm in my direction and never prevented me from studying neural networks.

In the early 1970s, everyone around me questioned why I persisted when Marvin Minsky and Seymour Papert (AI pioneers) said neural networks had no future. Honestly, I felt very lonely.

Marvin Minsky and Seymour Papert

In 1973, I gave my first presentation to a group, about how to do true recursion with neural networks. In my first project, I discovered that if you want a neural network to draw shapes, dividing the shapes into parts where each part can be drawn by similar neural hardware, then the neural center storing the entire shape needs to remember the overall position, orientation, and size of the shape.

If the neural network drawing the shape suddenly stops running, and you want another neural network to continue drawing the shape, then you need somewhere to store the shape and the work progress, so the drawing can continue. The difficulty now is how to make the neural network implement these functions. Obviously, simply copying neurons won't work, so I wanted to design a system that uses fast weights to adapt in real-time and record work progress. This way, by restoring the relevant state, the task could continue.

Therefore, I created a neural network that implements true recursion by reusing the same neurons and weights for recursive calls (just as for high-level calls). However, I'm not good at presenting, so I felt that perhaps no one understood my presentation.

They said, you can use Lisp recursion, so why do recursion in neural networks? What they didn't know was that unless neural networks can implement functions like recursion, there's a whole host of problems that can't be solved.

At that time, not everyone opposed neural networks. If we go further back to the 1950s, researchers like von Neumann and Turing still believed in neural networks. They were both very interested in how the brain works, especially Turing, who strongly believed in reinforcement training for neural networks. This also gave me confidence in my research direction.

Unfortunately, they died young. If they had lived a few more years, their intelligence would have been sufficient to influence the development of a field. Britain might have made breakthroughs in this area long ago, and the current state of artificial intelligence might be quite different.

02 From Pure Academic

to Google Employee

The main reason I went to work at Google was that my son had disabilities, and I needed to earn money for him.

In 2012, I thought I could make a lot of money by teaching on Coursera, so I launched a neural networks course. The early Coursera software wasn't very good, and combined with my own lack of proficiency with software, I often felt frustrated.

Initially, I had reached an agreement with the University of Toronto that if these courses made money, the university would share a portion with the instructors. Although they didn't specify the exact split, someone said it would be fifty-fifty, and I happily accepted.

During the recording process, I had asked the university to help me record videos, but they countered, "Do you know how expensive it is to produce videos?" Of course I knew, because I had been making videos myself. The university still provided no support. However, after I launched the course (by which point I was already committed), the provost unilaterally decided—without consulting me or anyone else—that the university would take all the money, and I would receive nothing. This completely violated our original agreement.

They told me to keep recording, saying it was part of my teaching duties, but it actually wasn't part of my teaching responsibilities at all—it was a course based on lectures I had previously given. Therefore, I never used Coursera again in my subsequent teaching work. That incident made me very angry, and I even began considering whether to pursue a different career.

Right at that moment, many companies suddenly extended offers to us, willing to sponsor substantial funding or support us in starting a company. This showed that many companies were indeed very interested in our research.

Since the state government had already given us a research grant, we no longer wanted to earn extra money and preferred to focus on our research. But that experience of being cheated by the university out of earnings made me think about making more money, so we later auctioned off our newly established DNN-research.

This transaction took place during NIPS (Neural Information Processing Systems conference) in December 2012. The conference was held at an entertainment venue by Lake Tahoe. In the basement, lights flashed as shirtless gamblers cheered in smoke-filled rooms: "You won 25,000, it's all yours"... Meanwhile, upstairs, a company was being auctioned off.

It was like being in a movie, exactly like what you'd see on social media—truly amazing. The reason we auctioned the company was that we had absolutely no idea what we were worth, so I consulted an intellectual property lawyer. He said there were two options: first, directly hire a professional negotiator to deal with those big companies, but this might lead to unpleasantness; second, hold an auction.

To my knowledge, an auction for a small company like ours was unprecedented in history. I ultimately chose to conduct the auction through Gmail, because that summer I had been working at Google, and I knew they wouldn't casually steal users' emails. Even now, I still believe this. But Microsoft expressed dissatisfaction with this decision.

The auction process was as follows: participating companies had to send their bids to us via Gmail, and we would then forward them to other participants along with the Gmail timestamp. The starting price was $500,000. When someone bid $1 million, we were thrilled to see the bidding continue to rise, and simultaneously realized our value was far higher than we had imagined. When the bidding reached a certain level (which we considered astronomical at the time), we became more inclined to work at Google, so we stopped the auction.

Compared to other companies, people generally prefer working at Google, and so do I. My main reason for liking the company is that the Google Brain team is excellent. I'm more focused on researching how to build large-scale learning systems and how the brain works. Google Brain not only has the abundant resources needed to research large systems, but also offers opportunities to exchange ideas with many outstanding talents.

I'm the straightforward type, and Jeff Dean is a smart person—pleasant to work with. He wanted me to do basic research, to try proposing new algorithms, which is exactly what I love to do. I'm not good at managing large teams; by comparison, I'd rather improve speech recognition accuracy by one percentage point. Bringing a new transformation to this field is what I've always wanted to do.

The Next Big Thing in Deep Learning

The development of deep learning depends on doing stochastic gradient descent in large networks with massive data and powerful computing power. Based on this, certain ideas have been able to take root and flourish better—such as dropout and much of today's research—but none of this would be possible without powerful computing, massive data, and stochastic gradient descent.

People often say deep learning has hit a bottleneck, but in fact it has kept advancing continuously. I hope skeptics will write down what deep learning currently cannot do. In five years, we will prove that deep learning can do these things.

Of course, these tasks must be strictly defined. For example, Hector Levesque (a professor in the Department of Computer Science at the University of Toronto) is a typical AI researcher, and an excellent one himself. Hector established a standard: the Winograd schema. One example is: "The trophy didn't fit into the suitcase because it was too small; the trophy didn't fit into the suitcase because it was too large."

If you want to translate these two sentences into French, you have to understand that in the first case, "it" refers to the suitcase, while in the second case, "it" refers to the trophy, because they have different genders in French. Early neural network machine translation was stochastic, so when the machine translated the above sentences into French, it couldn't correctly identify the genders. But this has been continuously improving. At least Hector gave neurons a very clear definition of what they can do. Although it's not perfect, this is still much better than random translation. I hope skeptics will raise more similar challenges.

I believe that this highly successful paradigm of deep learning will continue to flourish: adjusting large numbers of real-valued parameters based on gradients of some objective function. But we likely won't use backpropagation to obtain gradients, and the objective functions may be more local and distributed.

My personal guess is that the next big thing in AI will definitely be learning algorithms for spiking neural networks. They can solve the discrete decision of whether to spike, and the continuous decision of when to spike, thus enabling interesting computations using spike timing—something that's actually quite difficult to do in non-spiking neural networks. Not being able to deeply research learning algorithms for spiking neural networks earlier is one of the great regrets of my research career.

I'm not planning to research AGI, and I try to avoid defining what AGI is, because behind the AGI vision there are all sorts of problems, and general artificial intelligence cannot be achieved merely by increasing the number of parametric neurons or neural connections.

AGI envisions an intelligent robot similar to humans, as smart as humans. I don't think intelligence will necessarily develop this way; instead, I hope it develops more symbiotically. I think perhaps we will design intelligent computers, but they won't have consciousness like humans. If their purpose is to kill other people, then they might need to have consciousness, but hopefully we won't develop in that direction.

Trust Your Intuition, Follow Your Curiosity

Everyone thinks differently, and we don't necessarily understand our own thought processes. I like to act on intuition, and tend to use analogies in my research. I believe that the basic way humans reason is through analogy using the right features in large vectors, and this is how I do research myself.

I often run repeated experiments on the computer to see what works and what doesn't. Understanding the underlying mathematical logic of things and doing basic research is indeed important, and doing some proofs is also necessary, but these aren't things I want to do.

Here's a small test: suppose at a NIPS conference there are two talks. One is about a completely new, clever, and elegant method to prove a known result. The other is about a new, powerful learning algorithm, but the logic behind it is temporarily unknown to anyone.

If you had to choose one of these two talks to attend, which would you choose? Compared to the second talk, the first might be more readily accepted by people—everyone seems more curious about new methods to prove known things. But I would attend the second, because in the field of neural networks, almost all progress stems from intuition that flashes in people's minds during mathematical derivation, not conventional reasoning.

So should you trust your intuition? I have a standard—either you have sharp intuition, or you don't have it at all. If you don't have sharp intuition, it doesn't matter what you do; but if you do have sharp intuition, you should trust it and do what you think is right.

Of course, sharp intuition comes from your understanding of the world and tremendous hard work. When you accumulate extensive experience in one thing, intuition develops.

I have mild bipolar disorder, so I generally oscillate between two states: appropriate self-criticism makes me very creative, while extreme self-criticism gives me mild depression. But I think this is more efficient than having only a single emotion. When you feel irritable, you just ignore those obvious problems, and believe that something interesting and exciting is waiting for you to discover—keep moving forward. When you feel overwhelmed by problems, you must persist, sort out your thoughts, and carefully consider whether your ideas are good or bad.

Because of this emotional alternation, I often tell people that I've figured out how the brain works, only to be disappointed some time later when I realize my previous conclusion was wrong. But this is how things should develop, just as in those two lines by William Blake: "Joy and woe are woven fine, a clothing for the soul divine."

I think the essence of research work is the same—if you don't feel excited by success and depressed by failure, you're not a true researcher in the meaningful sense.

Throughout my research career, although I sometimes felt completely unable to grasp certain algorithms, I never truly felt lost and hopeless. In my view, regardless of the final outcome, there's always something worth doing. Excellent researchers always have many things they want to do, only lamenting the lack of extra time.

When teaching at the University of Toronto, I found that undergraduate computer science students were excellent, and many cognitive science undergraduates minoring in computer science also performed quite well. The latter group wasn't particularly skilled at technical matters, but they still did research very well. They loved computer science, were intensely curious about how human cognition forms, and had an endless stream of interest.

Scientists like Blake Richards (assistant professor at the Montreal Neurological Institute) know exactly what problems they want to solve, and then just head in that direction. Nowadays, many scientists don't know what they actually want to do.

Looking back, I think young people should find directions they're interested in, rather than simply learning techniques. Driven by your own interests, you will actively acquire the necessary knowledge to find the answers you want—this is more important than blindly learning techniques.

Thinking about it now, I should have studied more mathematics when I was young; it would have made linear algebra much easier.

Mathematics often made me despair, making it difficult to read certain papers. Especially trying to understand all those symbols was an enormous challenge, so I didn't read many papers. For neuroscience questions, I would generally consult Terry Sejnowski (professor of computational neuroscience). For computer science questions, I would ask graduate students to explain things to me. When I needed mathematics to prove whether some research was feasible, I could always find suitable methods.

The idea of making the world a better place through research is nice, but I enjoy more the pleasure of exploring the limits of human creativity. I really want to understand how the brain works, and I believe we need some new ideas—such as understanding brain operation through learning algorithms for spiking neural networks.

I think the best research work should be done by a large group of graduate students, provided with abundant resources. Scientific research needs youthful energy, continuous motivation, and intense interest in research.

You must be driven by curiosity to do the best basic research. Only then will you have the motivation to ignore those obvious obstacles and estimate what results you might achieve. For general research, creativity isn't the most important thing.

If you can figure out what a large group of smart people are researching, and then do something different, that's always a good idea. If you've already made some progress in a certain area, you don't need other new ideas—just dig deeper into existing research and you can succeed. But if you want to research some new ideas, such as building large-scale hardware, that's also excellent, though the path ahead may have some twists and turns.

Recommended Reading