Project Sid: A Multi-Agent AI Civilization Experiment

线性资本·November 5, 2024·7·0

Its results, findings, and the significance of the experiments.

From Stanford Town onward, LLM-driven simulated social experiments have captured the imagination. Take one step further from social experiments, and you arrive at civilization experiments. A few days ago, ALTERA published a technical report detailing the results of their multi-agent civilization experiment in Minecraft.

Who Is ALTERA?

ALTERA is an AI startup born out of MIT. On its website, it defines itself this way: "We are a multi-agent research company dedicated to building digital humans: machines with the fundamental traits of humankind." ALTERA chose gaming as the first application domain for digital humans, having previously released agent demo videos in Minecraft. In truth, it's hard to find another platform like Minecraft — simultaneously open-ended and complex, with social and collaborative properties, yet far simpler than real society. For this reason, many agent experiments have chosen this platform.

The ALTERA team teased Project Sid, as the experiment is called, two months ago. The project aims to explore how civilizations built by agents perform across various scenarios and dimensions through large-scale agent collaboration.

What Makes Agent Civilization Experiments So Difficult?

From models to architecture, today's large model technology still faces too many challenges in agent civilization experiments:

Inherent model limitations: Models still struggle with long-horizon planning. Whether in reality or in Minecraft, tasks often require many steps to complete. Accurately planning these steps, discovering errors during execution, and recovering from them remain extremely challenging for today's models.
Model hallucination: Because each agent's drive and inter-agent interaction in these experiments happens through language-based engagement with large models, hallucinations create additional obstacles to goal completion — leading agents that receive incorrect answers to make wrong decisions.
Concurrency stability: In simulated environments, every agent's actions affect the environment and society, and each agent needs time from input to decision to execution. When large numbers of agents run in parallel, ensuring that every agent can act quickly while also adjusting to environmental changes poses significant engineering challenges.
Error accumulation: Because the experiment proceeds with zero human intervention, every error in an agent's action and every error in agent-to-agent interaction affects subsequent outcomes. Without timely review and detection, errors in the experiment compound continuously.
Lack of evaluation metrics: Today's agent evaluation metrics tend to focus on performance in specific domain tasks like programming and reasoning. How to measure agent civilization development in open domains — especially where there are no "correct answers" — remains an open question.

Project Sid's Results and Findings

To address these problems, the ALTERA team proposed a new agent framework called PIANO (Parallel Information Aggregation via Neural Orchestration). I won't go into detail here; interested readers can find a thorough explanation in the technical report.

More interesting than the model framework are some findings from this series of civilization experiments. Each finding below comes from a separate experiment.

Division of Labor

In this experiment, different agents gradually developed distinct occupational roles within their community through social interaction. All agents started with identical settings, but by observing each other's social motivations, they gradually generated their own goals, achieving specialization. Agricultural agents focused on growing crops; artistic agents collected flowers as materials for creative works. What's notable is that this division of labor emerged entirely from social cognition — in a control experiment where the agents' social cognition module was restricted, they no longer showed any trend toward specialization and merely repeated the same actions at random.

Collective Rules

Agents can follow and adapt to collective rules, such as tax systems. In this experiment, agents initially paid taxes according to established rules. But as designated opinion leaders pushed for change through social influence, the agents adjusted tax rates through democratic voting — and subsequently modified their own tax payments accordingly. This experiment shows that agents can not only adapt to collective rules but also be influenced by and actively participate in modifying them. To some extent, this suggests agents can function in structured, rule-bound societies.

Large-Scale Social Dynamics

This experiment explored the spread of cultural concepts and religious ideas through a society of 500 agents distributed across six towns. During the experiment, certain cultural concepts spontaneously emerged in agent social interactions. Around these themes, distinct cultural identities gradually formed across towns, with urban areas showing richer cultural content than rural ones. For example, one town leaned toward environmental themes, while another became known for prankster culture.

The ALTERA team also implanted a religion called "Pastafarian" (the legendary Flying Spaghetti Monster Church) in this experiment. Through designated pastor agents preaching and spreading the faith, it gradually permeated ordinary agents' daily conversations. The funny part is that perhaps due to occasional model errors, agents also began using words like "pasta" or "spaghetti" in social contexts. The technical report presents this as demonstrating the cultural diffusion effect of religious ideas — rather like how "breakfast" originally referred to the first meal after a religious fast, later evolving to mean ordinary morning meals. That may be a stretch, but it somehow sounds quite plausible.

The Significance of Civilization Experiments

Though discussing agent civilization remains premature, Project Sid and related large-scale agent experiments have already brought practical significance to AI research. Currently, AI agents in industry still cannot handle complex problems or engage in large-scale collaboration, largely due to the challenges outlined above.

Existing AI models and agents can dramatically improve efficiency on repetitive, simple tasks — arming individuals to become "super soloists" — but their impact on large organizations remains limited. They let us complete in ten seconds what once took ten minutes of writing, yet they cannot help us finish three hours of work so we can relax with peace of mind.

To address these issues, we need more capable agents and larger-scale agent collaboration. This may sound like science fiction, but simulation experiments have at least helped us take the first step in understanding the problem, and pointed the way for future iterations of AI systems.

📮 More Reading

Linear Bolt Bolt is Linear Capital's dedicated investment program for early-stage, global-market-facing AI applications. It upholds Linear's investment philosophy, focusing on projects where technology-driven transformation creates change, and aims to help founders find the shortest path to their goals. Whether in speed of action or investment approach, Bolt's commitment is lighter, faster, and more flexible. In the first half of 2024, Bolt invested in seven AI application projects including Final Round, Xinguang, Cathoven, Xbuddy, and Midreal.