Qiyuan World Wins 2018 NeurIPS Multi-Agent Competition Learning Track Championship, Highlighting the Value of Its Decision Intelligence Platform | Gaorong Ventures News

高榕创投·December 21, 2018·9·0

Decision intelligence platforms' capacity to support multi-agent continuous learning

Qiyuan World is a decision-intelligence technology company founded in 2017 by Yuan Quan, former senior director of Alibaba's Cognitive Computing Lab and founder of Taobao's recommendation algorithm team, and Long Haitao, former head of Alibaba's search advertising architecture. Qiyuan's victory in the Learning track of the 2018 NeurIPS Multi-Agent Competition demonstrated the world-class technical capabilities of this Chinese decision-intelligence team.

Gaorong Ventures believes that decision intelligence will be a critical direction for AI's future. Qiyuan World raised tens of millions of RMB in angel funding from Gaorong Ventures in 2017; Gaorong was also the only investment institution Qiyuan approached during its angel round. The NeurIPS 2018 conference, one of the world's premier AI gatherings, recently concluded in Montreal, Canada. Over eight thousand AI researchers from around the world gathered to discuss and share the latest advances across artificial intelligence. The conference hosted a series of competitions to encourage academia and industry to tackle the most challenging problems in AI. As one of the longest-running academic conferences in the field, its outcomes are regarded as a research "weather vane" for the AI community.

Among the competitions, the NeurIPS 2018 Pommerman Competition — a multi-agent contest jointly organized by Google Brain, Facebook, Oxford University, and New York University, which has long been renowned in game AI circles — drew particular attention. A team comprising Dr. Peng Peng from Qiyuan World, Dr. Pang Liang, an assistant researcher at the Institute of Computing Technology, Chinese Academy of Sciences, and Yuan Yufeng from Beijing Normal University competed fiercely against 24 top teams from the US, Europe, Japan, and China. Their Navocado dual-agent system, trained on Qiyuan's decision-intelligence platform, demonstrated steady capability improvements and clinched the Learning track championship, showcasing the world-class technical strength of China's decision-intelligence teams.

Qiyuan World is a company founded in 2017 focused on cognitive decision-intelligence technology, launched by scientists and executives formerly from Alibaba, Netflix, and IBM, with distinguished advisors from institutions including Berkeley and CMU. The team's core capabilities rest on deep learning, reinforcement learning, and large-scale parallel computing, with proven track records across internet, gaming, and numerous other sectors.

Decision intelligence currently represents a world-class technical challenge; the decision-making process is also among the most complex functions of the human brain. Given its vast application prospects in gaming, transportation, power systems, and other domains, decision intelligence has become a global hotspot for AI research in recent years. Tech giants including DeepMind, Facebook, OpenAI, Microsoft, and Amazon have all established dedicated labs for related research. Compared to single-agent scenarios, multi-agent game theory increases difficulty exponentially. This NeurIPS multi-agent competition marked the first time such a contest was held at the conference, distilling key challenges including multi-agent collaboration, imperfect-information games, and continual learning into the Pommerman competition format, encouraging top researchers worldwide to tackle these technical hurdles together.

The NeurIPS multi-agent competition employed an intense double-elimination format. Twenty-five teams each fielded two agents for 2v2 matches. Each agent began trapped in a confined area and could only access other regions by blasting nearby wooden crates. Once nearly all obstacles were cleared, the match entered a confrontation phase where all agents could move freely across the entire map, with the primary objective becoming to eliminate opponents.

The Pommerman agent actively kicks a bomb toward its opponent, precisely eliminating them

Throughout the competition, agents needed to: 1) clear obstacles, 2) evade their own bomb flames, 3) collect power-ups, 4) dodge both their own and others' bomb flames, 5) place bombs to kill opponents, and 6) avoid killing teammates with their bombs. The entire process demanded high technical capabilities in effectively identifying and extracting relevant information, reasoning and hypothesizing about unknown information, and multi-agent coordination.

Each team had two months offline for model training. Ultimately, Navocado — trained on Qiyuan's decision-intelligence platform — defeated Skynet from Canada to win the Learning track championship. Skynet's team came from Borealis.ai, a Canadian technology company with nearly a hundred employees. From the match proceedings, Navocado's proactive offensive capabilities were clearly superior to its opponent's. Based on Skynet's publicly disclosed implementation on their website, their model incorporated substantial human intervention in the decision process (such as restricting the agent from walking into flames) — a significant contrast with Navocado's approach of learning all skills autonomously throughout training and decision-making without human intervention.

Navocado agent's performance improvement curve during continuous training

The decision-intelligence platform Qiyuan has been building since 2017 played a critical role in training the championship-winning agent. Reinforcement learning, as the core technology of decision intelligence, is also among the most challenging machine learning methods. Due to reinforcement learning's long involved pipeline and the algorithm's extreme sensitivity to hyperparameters, different implementations or configurations in academia frequently lead to irreproducible training results. Reinforcement learning technology faces challenges in reproducibility, reusability, and robustness.

Harnessing the power of platformization, Qiyuan's decision-intelligence platform applies reinforcement learning to complex decision problems, demonstrating its feasibility. The platform built foundational architecture supporting multi-agent games, enabling continual learning through competitive mechanisms. It also supports meta-learning including automated resource scheduling and automatic hyperparameter tuning, making model training more efficient.

Dr. Peng Peng from Qiyuan World noted, "The Qiyuan team is deeply passionate about reinforcement learning. The agent that won this NeurIPS multi-agent competition had no human intervention at any stage of its training process, and its learning curve was remarkably clean — further validating the effectiveness and robustness of this system, and confirming the value of reinforcement learning technology."

From platform architecture design to底层 implementation, Qiyuan invested extensive meticulous effort, striving for excellence across every decision-intelligence-related component including environment simulation, model estimation, and training. The Qiyuan team designed phased reward mechanisms and adjusted hyperparameters based on the decision-intelligence platform. During platform usage, the team could rapidly schedule required resources for task deployment, configure agents needed for matches, and in real-time observe different models' match performance and win-rate curves during training, enabling the fastest possible adjustments.

Qiyuan decision-intelligence platform architecture

In this NeurIPS multi-agent competition, Qiyuan's decision-intelligence platform provided three key advantages:

First, support for agents' continual learning capability.

Continual learning is a critical component of agent training. During training, Pommerman agents needed to retain previously learned skills while acquiring new ones to reach high performance levels. Qiyuan's platform achieved continual learning through a "natural selection" mechanism based on population matchmaking. During competition, stronger agents survived while weaker ones were eliminated. When weaker agents were eliminated, their slots were filled by clones of stronger agents, which continued evolving under new hyperparameter settings. With fixed computational resource budgets, the platform balanced resource allocation between exploring new strong agents (exploration) and deepening existing strong agents (exploitation).

Second, support for complex-scenario multi-agent joint training.

In multi-agent game problems, mutual countering between different agents is common, making convergence possibilities extremely complex. In the Pommerman competition, different teams' agents displayed vastly different styles — some excelled at offense, others at defense. Drawing on the "catfish effect" concept (where introducing strong competitors stimulates weaker ones to improve), Qiyuan's platform introduced rule-based advanced opponents early in training, stimulating initially weak agents to learn various fundamental skills through confrontation with stronger opponents and rapidly improve. As training progressed, the platform simultaneously trained multiple agents, allowing them to refine themselves through intense mutual competition.

Third, support for large-scale, high-concurrency simulation and massive-scale training based on private cloud clusters.

Qiyuan's decision-intelligence platform componentized multiple modules shown in the architecture diagram and containerized them. Through cloud-based automation, it managed hundreds of CPUs and GPUs and implemented container orchestration, reducing the cost of scheduling dozens of Pommerman training tasks. Large-scale, high-concurrency simulation computing and massive-scale training ran simultaneously in the private cloud cluster. Additionally, the platform provided distributed storage solutions configured as a shared model pool, supporting persistence and sharing of the Pommerman agent model population.

Qiyuan decision-intelligence platform v0.8 is currently deployed in gaming, network intelligence, and simulation scenarios. Based on high-value-added services provided to clients, Qiyuan World achieved respectable revenue in 2018 despite only preliminary commercialization efforts. In 2019, Qiyuan World plans to release the first version of its decision-intelligence platform product, bringing high-experience services to more industry clients and end users.

Recommended Reading

Qiyuan World Launches AI Human-Machine Collaboration Competition During ACM ICPC World Finals, Opening New Chapter in Collaborative Intelligence Research | Gaorong News

Gaorong Ventures manages US dollar and RMB funds totaling approximately RMB 15 billion, focused on early-stage and growth-stage investments in TMT.
Its limited partners include top-tier global institutional investors as well as Chinese corporate giants in finance, retail, advertising, and other industries.
Additionally, dozens of successful entrepreneurs — founders of companies including Tencent, Baidu, Taobao, Xiaomi, Meituan, Dianping, 360, Focus Media, Weibo, Sohu, JD.com, Vipshop, Tudou, and Autohome — are also LPs in Gaorong's funds.
The founding partners previously led investments in numerous outstanding companies including Xiaomi, Razer, Storm Technology, G-bits, Tudou, Wondershare, Archermind, 91 Assistant, 3G.cn, MOGU, Dota Legend, Yuanfudao, and others.
Since its establishment, multiple companies Gaorong has invested in or taken stakes in have grown into national or global leaders in their respective industries, including: Pinduoduo (NASDAQ: PDD), HUYA Inc. (NYSE: HUYA), HMI (NYSE: HMI), MOGU (NYSE: MOGU), Lifesense (300562.SZ), Meituan (03690.HK), Ping An Good Doctor (01833.HK), Zhongrongjin (acquired by Homa Appliances), DeePhi Tech (acquired by Xilinx), Qianbaibao (acquired by Meituan), Fanpu Jinke, Beibei, Leqi E-commerce, DotC, Nuro, YITU, Roborock, Tianrang Intelligence, Zhuiyi Technology, Tigerobo, Oasis Labs, Beitai Haoche, QuantGroup, Shuidihuzhu, Testin, Doumi, BIGO LIVE, Danke Apartment, Qian Damai, Perfect Diary, and Ucommune.
Gaorong Ventures maintains investment teams in Beijing, Shanghai, Guangzhou, Shenzhen, and Hangzhou.

Scan QR code to follow Gaorong Ventures

Thank you, good friends of Gaorong