Man vs Machine, A History. Part 2 – AI Quests That Benefit Us All

In the first part of this article, we embarked on a quest to follow the progress of computers defeating human champions in several classic games – including Backgammon, Chess and Go – as well as some more modern invention.

In this part, we will focus on advances within the last two years. This includes covering two vastly popular yet complex video games, namely StarCraft II and Dota 2, as well as a program that is able to play multiple games at superhuman levels. We will also discuss why being good at games is a tremendously important aspect of Artificial Intelligence and Machine Learning research, which many tend to underestimate, and what are the implications of this for humanity as a whole.

StarCraft II – Layers of Abstraction and Reinforcement Learning

StarCraft II is a strategy video game where the player explores the map, gathers resources, construct new bases and buildings, researches technology and produces combat units to ultimately attack and destroy all their opponent’s bases.

The program playing it has to interpret each game frame, lasting a fraction of a second, and perform actions with a mouse and keyboard. A simpler version is to connect directly to the game API and interact with its abstract model. Still, the game has many challenges to consider:

It has imperfect information, as the opponent hides in the so-called ‘fog of war’.
Players are required to manage dozens of buildings and units
Strategically, extensive planning ahead is also required, including maintaining a balance between investing resources in both economic output and military power
At a tactical level, players also have to steer individual units in battle at all times.

Despite all this, AlphaStar, a program created by the team behind AlphaGo, was able to decisively defeat one of the best human players – Grzegorz “MaNA” Komnicz – at the beginning of 2019, with a perfect score of 5 to 0.

AlphaStar is constantly modelling and analysing the state of battlefield, and it is able to recognise its bases, armies, and the environment it operates on from video input. The current version was limited to only play one of six possible matchups and only on a single map, but on the other hand, it was limited in a number of actions per minute it could issue and imposed a slight delay between input and reaction in an attempt to eliminate the superhuman reflexes factor.

The program was first trained similarly to AlphaGo, observing a set of replays of matches played between humans but, in the second phase, multiple instances of the program were put in matches against each other. These instances, otherwise known as agents, were balancing between exploration of the new territory of tactics and strategy and exploiting knowledge it already had to gain an advantage. The results are then interpreted into a reward that steers further actions. This technique is called Reinforcement Learning and is one of three of the newest Machine Learning paradigms currently being used, alongside supervised and unsupervised learning.

As of summer 2019, AlphaStar agents are roaming free on public game servers, learning from a myriad of human players, playing all three races available in StarCraft II. We are still waiting to see a version of the program that will be able to win with each race against each other race on multiple available maps.

Dota 2 – Teamplay and Recurrent Neural Networks

Dota 2 is a multiplayer online battle arena video game with two teams of five players.Each play controls one hero, earns experience, obtains gold, buys items and scouts the map, while also baiting, ambushing and fighting enemy heroes – all to ultimately destroy one of the buildings the opposition guards to win the game.

There are over 100 available heroes, each with several unique abilities and over 200 items in the game. The rules are very complex and the number of possible interactions between game elements and plays seems endless. The coordination between all 5 players in the team is also a very important aspect of the game.

In August 2017, the OpenAI team presented the first version of their program, simply called Five. It was able to beat several human champions in a very restricted version of the game with just two heroes in the first game phase. In April 2019, an updated version playing a full-length match, with a hero pool reduced to 18, and a few other limitations, was able to beat human champions team OG with a score of 2 to 0.

OpenAI Five sees uses the game’s API, similar to early versions of AlphaStar, and sees the game state as a list of 20,000 numbers, emitting one of 170,000 possible discretised actions 8 times per second. Reaction times are, once again artificially delayed to eliminate superhuman reflexes aspects.

Each AI player is also a separate program, so the full team in the game consists of 5 cooperating independent entities. Programs were trained using reinforcement learning techniques with proximal policy optimisation. Each program consists of a layer of 1,024 Long Short-Term Memory units. LSTM is a type of recurrent neural network that does not operate on a single input but can process an arbitrary-length sequence of inputs and track its dependencies due to internal state representation. The system was trained on 128,000 cores of Preemptible virtual machines and 256 P100 GPUs on Google Cloud Platform, which allowed it to accumulate a total of 900 real-time years of gameplay experience per day. Similarly to AlphaStar, we are still waiting for a version that will be able to beat human champions in a game without any limitations.

Three in One – Towards General Game Playing

Meanwhile, Deep Mind has been working on an extension of its AlphaGo program. The next step after beating the human champion in 2017 was a version called AlphaGo Zero.

Unlike previous versions, it didn’t rely on any initial training on historical match data, but started from scratch, without any knowledge of Go, improving only by playing against itself, using the reinforcement learning technique.

It was able to achieve performance at the same level of AlphaGo from 2016 within 3 days of training, reaching the same level as its predecessor’s 2017 state within 21 days.

In December 2018, Deep Mind published yet another milestone – a program called AlphaZero was able to learn three different games from scratch and quickly reach superhuman levels. The games were Chess, Go and Shogi. After 34 hours of training, AlphaZero was able to beat its predecessor, AlphaGo Zero, which was trained for 3 days with a score of 60 to 40. Then, it defeated the best chess-playing program, Stockfish, after just 4 hours of training – all from scratch. When playing against Stockfish, AlphaZero only needed to evaluate three orders of magnitude less positions per second than Stockfish, but it was enough.

Finally, it proceeded to win against best Shogi program – Elmo. Shogi, also known as Japanese Chess, is played on a 9 by 9 board with 20 pieces of 8 types. Its current form dates back to the 16th century and it has a slightly larger problem space than classic Chess. AlphaZero is one of the most famous examples of general game playing, a design of artificial intelligence that can play more than one game successfully. Such design is an important step on the road of leveraging AI to solve an increasingly diverse set of problems.

Other Games and Beyond

There are a plethora of Artificial Intelligence projects tackling various games and competitive activities. OpenAI hosts, among others, project Neural MMO,where a massive number of agents strive to survive and accomplish various tasks in vast open-ended worlds with various resources and challenges. IBM is extending Watson into Debater, which attempts to wrestle with a human expert in an open debate on a given subject. DeepStack, developed at the University of Alberta, is meanwhile winning against top Texas Hold’em Poker players.

With of the rapidly rising popularity and accessibility of Machine Learning tools, libraries, and resources, as well as the availability of the specialised power of Cloud computing, the number of games where a machine is better than the top humans at that respective activity or field is growing rapidly. Equipping smart machines with sensors and means to interact with the physical world is a natural extension of conquering virtual worlds – and is currently being researched.

The Endgame

Some may argue that programs playing games, especially videogames, are not worth putting effort into because they are not a “serious science”. This can’t be farther from the truth.

Games are models of reality with varied precision and divergences. Being able to automatically master accomplishing arbitrary tasks within environments with increasing complexity is the ultimate goal of artificial intelligence research. In the end, the world is just a game environment, albeit one that’s tremendously complex. We don’t have to understand or know all of it exactly to be able to play efficiently though.

Driving an autonomous car is a game. Diagnosing patients based on x-rays is a game. Translating between languages is a game. Detecting malicious email is a game.

Machines are getting better than humans at a quickly increasing number of tasks. Games research also has an aspect of entertainment and showmanship that captures broad audience attention and aids researchers and businesses alike to progress even faster. It also gives people real world context. Everyone has played Chess, and many of know about the likes of Starcraft II and its inherent complexity, so this allows the wider public to easily benchmark the current performance of today’s AI.

The next major step in the quest is Artificial General Intelligence or Strong AI, which has the capacity to understand and learn any intellectual task that a human being can. Many call this the ultimate human invention, or the holy grail of science, while others deem it impossible, at least in our lifetimes. However, on the journey so far, we have witnessed many sceptical claims about what’s possible being shattered to dust.

Business Perspective

Machine Learning is on the rise. We know how to create systems that are able to defeat the very best human players in immensely complex real-time games with increasing ease and flexibility. This directly translates to capabilities in providing invaluable assistance or replacing people entirely in a vast number of fast-pacing and sophisticated business areas – a feat that was thought to be impossible just a while ago.