Talking Ethics, Morality, and Lies with AI | Frees Fund
"Honesty" is the most precious gift that new technology has given us.


About AI: What Should You Know That You've Never Thought to Ask?
For years now, we've debated whether artificial intelligence will save or destroy the world: Will self-driving cars protect our lives? Will robots leave us unemployed and unemployable? The confusion swirling around the AI boom has yielded few definitive answers.
In this piece, Yonatan Zunger — a prominent writer on the tech blog Medium — draws on more than a decade at Google to clarify several practical questions about AI, touching on technology, ethics, morality, and deception:
- Are AI and machine learning the same thing?
- How many judgments does AI need to make before it can tell salmon from tuna?
- Why are "picking up a soda bottle" and "walking across a room" high-difficulty moves for AI?
- Why do we need to be honest in the face of AI?
- We seek rationalizations for our own behavior — does AI?
This is a long explainer that balances depth with genuine curiosity. We hope it sparks something for you. Here's a spoiler: The greatest challenge of artificial intelligence isn't solving technical problems — it's that AI forces us to articulate our goals with brutal clarity when we program it. And sometimes, we don't want to be honest with ourselves.


Four Questions About AI You Should Know But Probably Never Thought to Ask
By Yonatan Zunger
Source: Medium
Translated and compiled by Leiphone (leiphone-sz), Yanran, and FreeS Fund

374 Questions Later,
AI Finally Told Salmon from Tuna
The term "artificial intelligence" is terrifying. It's taken to mean "anything computers can't do yet" — playing chess, simulating conversation, recognizing images. Each time AI achieves a breakthrough, the boundary shifts. It's too close to how humans define themselves, the traits that separate us from other species. So people sometimes use "machine learning" as a substitute.
So what exactly is artificial intelligence, or machine learning?
Strictly speaking, machine learning is a branch of "predictive statistics": building a system that takes information about what happened in the past and uses it to construct models that predict what might happen in the future under related conditions. It can be as simple as "when you turn the wheel left, the car goes left," or as complex as trying to understand a person's entire life and tastes.
This diagram shows how an AI system operates:

▲ How an AI system works.
The system works by forming perceptions of features from sensors that perceive the world, then building a model that tells us how the world works and what consequences our actions will have.
In some AI systems, "features" are simply raw perceptions — like the colors a camera sees. The system has no preconceptions about which features matter and which don't, but this makes the AI model harder to build. Computing systems capable of processing information at this scale have only emerged in the last decade. In other AI systems, "features" are whatever the model designer believes will be useful.
Next, the "model" typically presents us with many possible outcomes and its understanding of each. If you want the AI to make a decision, you need to give it rules. For example, tell it to "pick the person most likely to succeed" or "pick the person least likely to lead to catastrophic failure."
To take a simple example, imagine the mechanical governor on an old steam engine. Its rule is: if the sensor reads pressure above a set threshold, open a valve; otherwise, close it.
This rule is simple because it references only one input and makes one decision. But if you need to rely on thousands or even millions of pieces of information to decide something more complex, you'll find that setting rules is anything but simple.
How do you control a car (which depends on vision, hearing, and more)? Or determine which webpage offers the best answer about koala husbandry (which depends on whether you're casually curious or a professional vertebrate enthusiast, and whether the site was built by passionate hobbyists or just wants to sell you cheap koala aphrodisiacs)? These require millions or even tens of millions of pieces of information to decide.
AI models are designed specifically to handle complex information. Inside any AI model, a series of rules synthesize all features, with hundreds or thousands of individual "knobs" guiding the model's decisions, telling it how to weigh the importance of each feature in different situations.
For instance, there's an AI model called a "decision tree" that looks like a giant tree of Yes/No questions. If its task is to distinguish salmon from tuna, its first question might be, "Is the left half of the image darker than the right half?" And its last question would be something like, "Based on the answers to the previous 374 questions, is the average color in this square orange or red?"

▲ Decision tree diagram.
The "knobs" determine the order in which the model asks questions, and where the boundary falls between "yes" and "no" for each question. You can't simply stumble upon the right combination of questions that reliably distinguishes salmon from tuna — there are too many possibilities. So initially, the AI model runs in "training mode," adjusting its "knobs" through example after example, self-correcting after each mistake. The more examples it sees, the better it gets at finding the signal amid the noise.
Compared to humans, AI's advantage isn't decision speed. AI typically takes a few milliseconds to make a decision; humans take roughly the same. Its real advantage is that it never gets bored or distracted. It can make millions or billions of decisions in succession on different fragments of data. This means it can be applied to problems humans are bad at solving — like driving cars.
Humans are terrible at driving. In 2015 alone, 35,000 people died in car crashes in the United States. The vast majority were caused by driver distraction or error. Driving demands enormous focus and rapid reaction times, sustained for hours on end. It turns out we frequently fail at this.
When people talk about using AI in a project, they usually mean breaking the project down into the flowchart above, then building the right AI model. This process begins with gathering the examples needed to train the model (usually the hardest part), then choosing the model's basic shape (the "neural network," "decision tree," or other fundamental models suited to different problems) and training it; after that comes the most important part — figuring out what's going wrong and adjusting for it.
To illustrate, look at the six images below and find the key difference between the first three and the last three:

▲ The features AI focuses on may not be what you actually care about.
If you guessed "the first three have carpets," you're right! If you guessed "the first three are photos of gray cats, the last three are photos of white cats," you're also right. But if you used these images to train your gray cat detector, that AI model would perform terribly in the real world, because what it actually learned was: "Gray cats are cat-shaped things on top of carpets."
When your model learns features from the training data that aren't what you actually care about, you get "overfitting." Most of the time, people building machine learning systems are worrying about this problem.

What Makes a Novel "Good"?
AI Can't Answer That Either
Next, let's talk about whether AI is actually useful or useless.
If your goal is clear and the means to achieve it are also clear, AI isn't needed. For example, if the goal is "tighten all lug nuts on a wheel to 100 foot-pounds," you just need a wrench that can tighten and measure torque, stopping when it hits 100 foot-pounds. If someone tried to sell you an AI wrench, you'd ask: why do I need this? This lowers the threshold for what counts as AI.
Another example: achieving human-level "motion planning" is extremely difficult for machines. Our brains devote twice as much focused attention to this task as to almost anything else.
Pick up an object near you right now — say, an empty soda can — and observe how your arm works.
Here's what I see: the arm rotates quickly at the elbow, moving the hand horizontally from the keyboard to a position a few inches vertically from the can, then stops abruptly. Next, it moves forward at a much slower — though still quite fast — speed, opening the palm slightly wider than the can's diameter. Once the thumb appears opposite the other fingers, the palm closes and stops immediately upon meeting resistance. Then the arm begins to lift, tensing from the shoulder (keeping the elbow fixed) so the hand grips the can firmly without crushing it.
Other tasks in this same category include facial recognition — most of the brain's visual function isn't general-purpose vision but specialized face recognition. These things feel easy because a large chunk of our brain is dedicated to recognizing faces. Without it, we'd see people the way we see armadillos. That's essentially where computers are now.

▲ "Motion planning" is hard for AI.
So what problems can AI help you solve?
My answer: Problems where the goal is clear but the means to achieve it are not.
Specifically, several conditions must be met:
- The number of external stimuli is limited enough for the model to learn them;
- The number of elements that must be controlled is limited — we don't need to consider too many;
- The number of decisions to be made is large, making it hard to simply write down rules;
- It's easy to connect an action to an observable result, so we can readily figure out what works and what doesn't.
Take a racing game, for instance. At first, the consequences of your actions are obvious: when you should turn, you turn; if you hit the wall, the game ends. But as you get better, you start realizing, "Crap, I missed an important upgrade — I'm going to be screwed in five minutes." You begin to foresee consequences further out. AI can accelerate this learning process.
We've covered situations where both goals and means are clear, and where the goal is clear but the means are not. There's a third category where AI can't help at all: when the goal itself isn't well understood.
Computers aren't good at self-deception. The first rule of programming them is: if you want them to do something, you have to explain what you want. But in reality, we often don't even know the true definition of a "good goal." In such cases, how do you know if you've succeeded?
Putting it all together, the difficulty of achieving goals for AI, from easiest to hardest, is:
- Predictable environment, directly specified goal. For example, on an assembly line where cars appear soon, an AI sensor's goal is to identify a wheel.
- Unpredictable environment, directly specified goal. For example, self-driving cars: the goal can be directly described as "travel safely from point A to point B at a reasonable speed," but the process may contain many surprises. AI has only recently advanced enough to tackle these problems.
- Predictable environment, more indirect goal, very distant relationship between actions and outcomes. For example, planning your financial portfolio. This is a trickier problem; we haven't made major progress yet, but I hope we can get these right within the next decade.
- Unclear goal. AI cannot solve these. Writing a novel is one example, because there's no clear answer to what makes a book a "good novel."

AI Ethics and the Real World:
Don't Lie to Me
Now let's get to the heart of the matter: what are the things where AI's success or failure has major consequences?
Here are six examples to think about. Their main value isn't in providing the right answers, but in asking the right questions.
▍ Passengers and Pedestrians
A self-driving car is crossing a narrow bridge. Suddenly a child runs out in front. It's too late to stop. The car can only proceed, hitting the child, or swerve, sending itself and its passengers into the rushing river below. What should it do?
This scenario has been publicly discussed, and it illustrates the questions we really need to ask.

▲ Self-driving can avoid dangers caused by distracted or slow-reacting drivers.
Of course, we should acknowledge a loophole in this problem — it has a very low probability of occurring in practice, because self-driving cars would avoid this situation from the start. Most of the time, such situations happen because a human driver's reactions aren't fast enough to handle a child jumping out from behind an obstacle, or because the driver is distracted for some reason and notices the child too late. But these problems barely exist for self-driving systems.
Yet "almost never" isn't the same as "absolutely never." We have to admit this could happen. When it does, what should the car do?
With human driving, we might say, "it depends on the circumstances." But now, there's a blank left in the self-driving program that demands we fill in the answer before any accident happens, and then it will do exactly what we told it. This requires us to be brutally honest about what decisions we want.
▍ Politely Making Things Up
AI models have a very annoying habit: they analyze whatever data you show them, then tell you what they learned.
In 2016, high schooler Kabir Alli tried Googling "three white teenagers" and "three black teenagers." The results were terrible. "Three white teenagers" showed charming, athletic teenage figures; "three black teenagers" displayed mugshots from news reports of Black teenagers being arrested. (Now, most search results are news coverage of this incident itself.)
▲ Kabir Alli's search results.
This wasn't because of bias in Google's algorithm, but because the underlying data itself was biased. This particular bias stemmed from a combination of "invisible whiteness" and media coverage. If three white teenagers were arrested for a crime, media outlets were unlikely to show their photos or specifically note they were "white teenagers." But if three Black teenagers were arrested, you could find the exact phrasing from the news reports mentioned above.
Many people were shocked by these results because they seemed inconsistent with the national ideal of being "race-blind." But the data clearly showed: when people used high-quality images in media to say "three black teenagers," they were always talking about these kids as criminals, whereas when they talked about "three white teenagers," it was almost always advertising photography.
If you manually input a feature to "ignore race," that characteristic still enters through the back door. For example, someone's zip code and income can predict their race with high accuracy. AI models will quickly treat this as "the best rule."
AI models hold up a mirror to us; they don't understand when we don't want to be honest. They will only politely make things up if we tell them in advance how to lie.
One example is a recent technical paper on "debiasing" text. An AI model called word2vec learned various relationships between English word meanings (such as "king is to male" as "queen is to female"), and researchers found the model contained numerous examples of social bias. For instance, "computer programmer is to man as homemaker is to woman."
The authors then proposed a technique to eliminate gender bias through debiasing. The overall program was quite reasonable: first analyze words to find groups of words that split along gender axes; next, have a group of people determine which correspondences make sense (such as boy to man/woman to woman) and which represent social bias (such as programmer to man/homemaker to woman); finally, use mathematical techniques to remove biased word groups from the model, leaving an improved version.
But this process wasn't fully automated. The crucial step of determining which male/female distinctions should be removed was a human decision.
The original model came from analyzing millions of written texts from around the world, accurately capturing people's biases. The cleaned model accurately reflected the evaluators' preferences about which biases should be removed. It would be wrong to say the modified model more accurately reflects what the world is like.
▍ The Gorilla Incident
In July 2015, when I was serving as Google's technical lead for social work (including Photos), I received an urgent message: our photo indexing system had publicly described a photo of a Black man and his friends as "gorillas." I immediately called the team. They took action, disabling the offending label along with several other potentially risky ones.
Many suspected this problem had the same cause as the issue six years prior with HP's face camera not working on Black people: the "face" training data consisted entirely of white people. We initially suspected this too, but quickly ruled it out because the training data included people of various races and skin tones.
The real cause of the problem was more subtle.
First, facial recognition is hard. Faces are far more similar than we imagine — even across species. This photo indexing system would also mistake white people's faces for dogs and seals.
The second issue is central: machines are very smart, but unless you teach them, they know nothing about the wider world. Nobody explained to it that Black people had long been compared to apes due to discrimination. That context is what allowed the machine to connect these two things.
Problems involving humans are usually tied to extremely subtle cultural issues that we struggle to anticipate. When we need to make value judgments across different cultural contexts, these issues almost entirely require human handling — they cannot be left to AI.

▲ AI lacks moral and cultural background information, making it difficult to judge related issues.
Even establishing the rules humans use to judge these things is extraordinarily difficult. Cultural barriers are a massive problem. A critic in India may not have the cultural background to understand American racial discrimination, and someone in America may not have Indian cultural context. The number of cultures around the world is enormous. How do you express these ideas in a way anyone can understand?
I spent a year and a half at Google trying to do exactly this. And the lesson I learned: the most dangerous risks in a system usually don't come from inside the system, but from unexpected problems when the system interacts with the broader external world. We don't yet have a good way to manage these.
▍Unfortunately, AI does exactly what you tell it to do
One important use of AI is helping people make better decisions. AI is most valuable when these choices are high-stakes. Without clearly useful information, humans may easily adopt unconscious biases rather than real data. Many courts have begun using automated "risk assessments" as part of their sentencing guidelines. If you train a model on the complete historical corpus of a district court, it can tell you clearly who is a potential danger.
If you've been reading carefully so far, you might be thinking of ways to achieve this goal. But these methods can be terrifying, and terribly wrong, as ProPublica exposed in 2016.
Broward County, Florida's court used the COMPAS system. Its designers followed best practices, ensuring the training data didn't artificially favor any group, excluding race from the model's input features. But the AI model didn't predict what they thought it was predicting.
The COMPAS system judged the probability that a person would be convicted based on information known at the time of past sentencing, or compared two people to determine which was more likely to be convicted in the future. If you know anything about American politics, you can immediately answer this: "Black people!" Black people are more likely to be stopped on the road, arrested, convicted, and given longer sentences than white people. So the AI model, looking at historical data, also predicted that a Black defendant was more likely to be convicted in the future.
But the way this model was trained didn't match its real purpose. It was trained to answer "who is more likely to be convicted," but the question we wanted answered was "who is more likely to commit a crime." Nobody noticed these are two completely different questions.
Here's a point worth noting: what you want an AI model to judge, and what it can actually judge, often differ. Before trusting an AI model, you need to very carefully understand these similarities and differences.
▍Humans are rationalizing animals
There's a new hot topic in machine learning discussions: the right to explanation. It means that if AI is used to make any important decisions, people have the right to understand how those decisions were made.
Intuitively, this seems obvious. But when professionals mention this, their faces immediately change. They know it's actually impossible.
Why is this?
Above, I described an AI model's decision mechanism as hundreds to millions of "knobs." This metaphor doesn't do justice to actual model complexity. For example, AI-based language translation systems process one letter at a time, but the model must read vast numbers of letters before understanding text. The only "explanation" it can offer is: "Well, the next few thousand variables are in this state, then I see the letter c, which should change the probability that this word is talking about a dog..."
Debugging AI systems is one of the hardest problems in the field, because at any point, examining the individual state of variables and then explaining the model to you is about as difficult as measuring a person's neural potential and telling you what time they'll have dinner.
We always feel we can explain our own decisions, in the way people expect. For instance, they expect AI to explain: "Given their median FICO score, I set this mortgage rate at 7.25%." Or "If the Experian FICO score were 35 points higher, the rate would drop to 7.15%." Or "I recommend hiring this person because they clearly explained machine learning in the interview."
But every cognitive or behavioral psychologist knows a dark secret: all these explanations are nonsense. Whether we like someone is decided in the first few seconds of conversation, and can be influenced by seemingly random things like whether they were holding a hot or cold drink before shaking hands.

▲ Humans always rationalize their behavior, but AI isn't good at this.
It turns out what people are good at isn't explaining how they made decisions, but finding reasonable explanations for their decisions. Sometimes this is entirely unconscious — for example, we highlight certain facts in our decision process ("I like this car's color") and focus on them, ignoring factors that may be more important but invisible to us ("My stepfather had a convertible, and I hate my stepfather"). ("The first candidate sounds just like I did when I graduated"; "That woman was nice, but she looks too different, she wouldn't fit working with me.")
If we expect AI systems to provide actual explanations for decisions, we're in for a lot of trouble. Currently, only models like "decision trees" can be fully understood by people, while many of the most useful models in practice, like neural networks, are completely incomprehensible.
The human brain has extremely general intelligence for processing all kinds of concepts, and thus can solve this problem. You can tell it to be extra careful with image recognition when it involves racial history, because the same system (the brain) can understand both concepts. But AI is far from capable of this.
AI Is, Ultimately, Just a Tool
AI killer drones — we can't discuss AI ethics without bringing up everyone's favorite example. These aircraft fly at high altitude, controlled only by computers, tasked with killing enemy combatants while preserving civilian lives... unless they determine the mission requires some "collateral damage," as the official euphemism goes.
People are frightened of such devices. If you listen to stories from people living under constant threat of death, they become even more terrified of killers emerging from clear skies.
Large drones differ from manned aircraft in that drone pilots can be thousands of miles away, removed from harm. Large drones can fly themselves 99% of the time, only calling for humans when major decisions need to be made.
Now we might ask: who bears the moral responsibility for killings decided entirely by robots?
This question is both simpler and more complex than we imagine. If someone hits another person with a rock, we blame the person, not the rock. If they throw a spear, even though the spear is "under its own power" for part of its flight, we would never think to blame the spear. But now, the scope of what the "tool" decides on its own has become blurred.
The simple part is that this problem isn't entirely new. A major point of military discipline is to establish an order where people don't think too autonomously in combat — sergeants and non-commissioned officers exist to execute plans. So in theory, decision responsibility rests entirely on officers' shoulders, with clear delineation of responsibility zones based on rank, commander, and so on determining who ultimately bears responsibility for any given order. But in practice, this is often quite ambiguous.
There are many more issues we should discuss, many of them extremely urgent for society. I hope the examples above help you understand when things are right, when they're off, and where many AI ethical risks originate.
Many of the problems we face regarding AI aren't new. It's just that now, these problems are surfacing again through certain technical changes.
Because AI lacks cultural background and the ability to infer our implied meanings, it forces us to express ourselves in ways that violate our daily habits. Whether it demands we make life-and-death decisions before crises arrive, or asks us to rigorously examine society's actual conditions over the long term and clearly state which parts we want to preserve and which to change.
AI pushes us out of the comfort zone of "polite fictions" into a world where we must discuss things with extreme explicitness. This may not be easy, but for us, honesty may be the most precious gift that new technology can bring.
(Feel free to share to your Moments. This article was originally published on Medium, English original Asking the Right Questions About AI. Click "Read Original" for the complete translated version from Leiphone. Cover image from the film Blade Runner 2049.)

▲ How do you value an AI company?
▲ 13 of China's top AI experts in conversation with AlphaGo's investor
