Hyperparameter has unveiled its game AI progress for the first time, partnering with Seasun to develop "Orion α," a 3D survival game AI.

高榕创投·December 13, 2019·8·0

Deep integration of AI capabilities with gaming scenarios is showing broad application potential.

Recently, Hyperparameter — a startup focused on the AI + gaming track — publicly disclosed its progress in the gaming space for the first time. In Sea of Glory, a battle royale title currently in development by Seasun Games, Hyperparameter collaborated with Seasun to develop "Orion α," a 3D survival AI trained from scratch using reinforcement learning. The AI demonstrates complex 3D environmental perception, looting and item usage, combat, and team coordination capabilities, showing broad application potential.

Hyperparameter was founded in early 2019 by Yongsheng Liu, former general manager of Tencent AI Lab and a T4-level technical expert. The company has raised a Series A round from 5Y Capital and Gaorong Ventures. Its team includes AI scientists and technical leads from Tencent AI Lab and Tencent Interactive Entertainment Group (IEG), as well as elite talent from top domestic and international universities. During their time at Tencent, Liu and the core team led the development of the Go AI "Fine Art" and the Honor of Kings AI "Wukong AI." Since its founding, Hyperparameter has focused on deep learning, reinforcement learning, and large-scale systems engineering. By deeply integrating AI capabilities with gaming scenarios, the company provides AI solutions for games across multiple genres — including board games, casual games, RPGs, and open-world survival titles — helping developers improve production efficiency, unlock new gameplay possibilities, and create value across game design, development, and operations.

Two Major Challenges Remain for Game AI

AlphaGo's emergence in 2016 sounded the trumpet for AI exploration in gaming. Over the following three years, AI conquered StarCraft, Dota 2, Texas hold'em, and mahjong. As OpenAI Five and AlphaStar matured, it seemed gaming had been largely mastered by AI.

Yet two significant problems remain unsolved.

The first is environmental complexity. The greatest appeal of video games as the "ninth art" lies in their simulation of the real world. However, most games conquered by AI so far operate in 2D space. Even DeepMind's 3D Quake III AI was built on last-generation game architecture — simple maps, few agents — raising questions about whether such game AI capabilities can transfer to the real world.

The second is AI human-likeness. Existing game AI pursues competitive performance, optimizing for win rates and rank. But from the perspectives of developers and players, stronger isn't always better — more human-like is. In shooter games, for instance, AI can snap-aim for headshots and easily dominate human players, destroying the fun in the process.

Based on these two points, Hyperparameter determined that 3D survival games with complex environments and multiplayer online requirements would become AI's next major challenge. In Seasun's in-development title Sea of Glory, Hyperparameter trained an AI agent named "Orion α." Currently, Orion α has developed comprehensive capabilities including complex 3D environmental perception, looting and item usage, combat, and team coordination.

#Video: Orion α demonstration

The Difficulties and Challenges of 3D Game AI

Sea of Glory is a next-generation multiplayer online tactical competitive game developed independently by Seasun. Built around the currently popular battle royale format, one hundred players compete in a land-and-sea fight to the finish to determine the winner.

In the game, 25 four-person squads parachute into a region, where players must gather weapons, armor, items, and other resources across oceans and islands. As the match progresses, the safe zone on the map gradually shrinks, battles become more frequent, and players must coordinate with teammates to employ flexible land-and-sea strategies — eliminating other squads and surviving to the end.

In-game elements closely resemble real-world physics

As a 3D game, Sea of Glory's complexity is already a step above typical 2D games, while battle royale elements like massive maps and 100-player matches raise the technical difficulty further.

Specifically, the challenges AI must handle include:

1) Real-time and long-term decision-making Players must make both real-time operational decisions and long-term strategic plans, balancing both. To ultimately win, a full match typically lasts over 30 minutes, corresponding to more than 7,000 decision steps.

2) Imperfect information In 3D games, players can only see information within a certain field of view, and cannot see what obstacles block. Therefore, players must effectively explore invisible information and possess memory capabilities.

3) Complex state space 3D environments contain more information than 2D environments — complex spatial structures with depth, massive maps (10km × 10km), numerous players (100), and rich elements (abundant buildings, obstacles, loot, etc.) — posing enormous challenges for environmental perception and exploration.

4) Complex action space Players must simultaneously control movement direction, camera direction (horizontal/vertical), attacks, body posture (standing, crouching, prone, jumping), interactions (looting, healing, reloading), and other operations, creating a vast combinatorial action space. After discretization, the number of possible actions is estimated to be on the order of 10^7.

5) Strategy and tactics Players must make rapid, accurate judgments about rapidly changing environments and situations, employing rich strategies and tactics such as covering fire, flanking, position contesting, zone edge control, smoke-screen rescues, and more.

6) Multi-agent gameplay Players must not only closely cooperate and communicate with teammates, but also compete against other squads during resource gathering and armed engagements. Multi-agent scenarios are far more complex and variable than two-player games.

These difficulties are also the primary reasons why behavior-tree AI cannot produce complex, human-like operations.

At this research stage, Hyperparameter focused on a mini-game scenario: a 230m × 230m island, 6-minute time limit, 2v2 team matches, with the last surviving team winning. Apart from these constraints, all other game elements are identical to the full game.

Orion α's Implementation Path

Orion α employs deep reinforcement learning methods, learning from scratch through environmental interaction and trial-and-error to observe the world, execute actions, and develop cooperation and competition strategies. The AI uses no human player match data whatsoever, learning entirely through self-play.

The AI's observed state information includes entity information for players/items, depth maps, radar maps, mini-maps, and macro scalar information. Like humans, the AI observes imperfect state information — only seeing what's within a certain viewing angle, with no visibility outside the field of view or behind obstacles. Compared to using raw RGB images as features, this approach eliminates image object detection and recognition, focusing purely on the AI's decision-making process. Additionally, radar maps and mini-maps function like high-definition maps in autonomous driving, while depth maps correspond to information captured by depth cameras.

The AI's action outputs are divided into tasks: movement direction, horizontal/pitch orientation, body posture, item looting/usage, weapon switching, attacks, and more — with multiple tasks executable simultaneously, forming a massive composite action space. Human players have reaction time limitations and APM (actions per minute) caps. To match human performance, the team imposed corresponding restrictions on the AI. Accounting for network transmission delay, feature extraction, and model prediction time, the AI requires 120ms from "observing 1 frame of state" to "producing 1 action." Additionally, 100ms of extra delay was added. Meanwhile, the AI executes at most 4 actions per second, with each action containing at most 3 operations.

Each agent is a deep neural network model that inputs state information and outputs predicted action commands. Orion α processes player, item, and other entity information through Transformer models, processes depth maps, radar maps, and mini-maps through ResNet, and processes macro scalar information through MLP models, then implements memory capabilities through LSTM models. To achieve multi-agent cooperation, the team adopted distributed policy networks with centralized value networks, introducing communication mechanisms between policy networks.

AI model architecture diagram

Orion α's training runs on Delta, Hyperparameter's self-developed general-purpose distributed reinforcement learning engine. This engine generates training data through large-scale elastic CPU resources, updates neural network model parameters through GPU resources, and monitors the AI's training process through monitoring components. In this project, one day of Orion α's training is equivalent to 100,000 years of human gameplay. The engine can be deployed on any public cloud and currently supports AI training for multiple games.

Distributed reinforcement learning engine Delta architecture diagram

Orion α Has Gradually Mastered Comprehensive Survival Capabilities in 3D Environments

Currently, Orion α has gradually learned from scratch the comprehensive capabilities needed to survive in 3D environments.

The AI learned to take care of itself through looting and outrunning the zone:

After spawning, the AI quickly loots items. When observing high-tier loot outside the safe zone, it chooses to quickly exit, collect the items, and return to safety as soon as possible.

The AI possesses obstacle-avoidance navigation capabilities, quickly entering and exiting buildings through windows to loot.

The AI also learned to improve its survival in competitive combat through finding cover, agile movement, weapon usage, and cognitive abilities like memory:

In combat, the AI reasonably uses cover and maintains agile movement to evade attacks.

When entering close-quarters combat, the AI switches to melee weapons, then switches back to ranged weapons after creating distance.

The AI also learned to leverage team coordination, providing mutual cover with teammates and employing targeted strategies and tactics in different combat environments to maximize advantages:

When an AI is knocked down, its teammate immediately performs a rescue, then stands watch while the revived teammate heals.

In team combat, AIs spread out and create flanking angles, concentrating fire to eliminate individual enemies first.

After gaining a numbers advantage by eliminating enemies, the AIs aggressively push forward, sequentially entering rooms through staircases to eliminate remaining enemies.

Developers also conducted multiple human-vs-AI playtests with Orion α, both as teammates and opponents.

Human player vs. AI playtest

Using Evolution to Manifest Intelligence, Letting Intelligence Benefit Humanity

During the AI's training process, the team also observed many parallels with human evolutionary processes.

In human evolution, we first learned to gather food for energy and cope with harsh weather, then learned to use various tools, mastered advanced cognitive abilities like memory, and subsequently learned to divide labor with companions within our tribe and compete against other tribes. Through multi-agent self-training, the AI exhibited similar evolutionary phenomena. As training matches accumulated, the AI gradually emergent intelligent behaviors including looting, item usage, spatial awareness, cognitive abilities, and complex strategies.

The AI's evolutionary process

In reality, 3D survival games pose enormous challenges for AI research. Numerous limitations and unresolved issues remain: for example, the AI can only engage in land combat on a single island; gameplay is limited to two teams; and the items and weapons the AI has mastered are still relatively limited. The Hyperparameter team stated they will gradually remove these restrictions, ultimately enabling AI to compete in full 100-player battle royale matches on complete maps.

Sea of Glory game panorama

Looking at the world we inhabit, intelligence exists not only in individuals but even more so in groups. Individuals with independent goals gather together, demonstrating astonishing collective intelligence — a capability known as "multi-agent learning." Each agent must be capable of independent action while also cooperating or competing with other agents, adapting and surviving in this ever-changing world.

Among all current game genres, 3D survival games may be the closest to the real physical world. Hyperparameter stated: "We have reason to believe that the capabilities AI agents learn within them — including 3D environmental perception and understanding, adaptation to complex environmental changes, assessment and reasoning under uncertainty, flexible application of various strategies and tactics, and competition and cooperation among multiple agents — will certainly feed back from virtual to reality, bringing value to broader fields such as autonomous driving, smart cities, and healthcare."

"Just as the name 'Orion α' represents, we hope game AI research will shine like bright starlight in the vast universe, guiding humanity to explore unknown spaces light-years away, sailing toward the starry sea of artificial general intelligence (AGI). We look forward to joining with more like-minded people to 'use evolution to manifest intelligence, letting intelligence benefit humanity.'"

#Video: About Hyperparameter

Making Automation Simpler! After Two Years of R&D, "Ruben Technology" Arrives with Three Hardcore Products

African Mobile Payment and Mobility Service Provider OPay Completes $120 Million Series B Round | Gaorong News