Reinforcement Learning for Strategy Game Unit Micromanagement

Get Complete Project Material File(s) Now! »

Hausdorff Distance for Similarity Computation

As described in the previous section, histograms can be suitable for abstracting case descriptions consisting of high-dimensional equally weighted features. Therefore, translating a case description based entirely on IMs like the one for high-level cases in Chapter 5 works well. A single IM or a set of IMs consist of a large number of low-dimensional data points with the same characteristics.The case representation and the model it is part of lead to a different type of problem. Cases are represented by fewer yet more diverse features. The cases are based on collections of units which are in turn defined through sets of unit attributes. The main issue is not the dimensionality of the case feature vectors but the large number of cases in the case-base resulting from the complex problem that is addressed through the component. Therefore, histograms for abstraction are unsuitable. A method had to be found to compare large numbers of high-dimensional data points quickly.Another method from the image processing domain turned out to be suitable, the Hausdorff distance (Huttenlocher et al., 1993). This metric is used to select similar cases from the large case-base. Applying this metric for fast similarity computation was inspired by an abstraction of unit descriptions as normalised n-dimensional feature vectors in n-dimensional space that form polygons. Polygon comparison with control for simple geometric operations such as translation and rotation is a very common problem in image processing (Jain, 1989). Instead of individually maximising unit-to-unit similarity through a complex and computationally expensive procedure as described in Section 6.3 that requires an in-memory table, this allows a fast computation of case similarity in a single pass with low memory requirements.The Hausdorff distance measures the degree to which each point in a set A lies close to some point in a set B. By comparing each point in both sets, this leads to a measurement of how far these two sets are removed from each other. In image processing, this measurement is used to compare objects from two images and thus measure the resemblance (Huttenlocher et al., 1993). In the problem addressed in this thesis, the Hausdorff distance serves to determine the resemblance between two groups of units where each unit is defined by a normalised feature vector.Given two sets of units A and B, the Hausdorff distance in a metric space between them is defined as h(A,B) = max_a_∈A{min_b_∈B{d(a,b)}}.

Layered and Hierarchical Learning

Decomposing larger tasks into hierarchies of smaller, more manageable tasks can lead to promising results in addressing large-scale problems and is a popular approach to reduce complexity Ontan˜´on et al. (2013). This section gives more detailed background on the layered learning paradigm and other hierarchical techniques.Hierarchical approaches that use only RL have seen some applications, although limited through the inherent characteristics of RL. Publications such as Marthi et al. (2005) and Hanna et al. (2010) successfully use variations of modular and hierarchical RL in tasks with small-to-medium-sized state-action complexity. This illustrates that while subdividing a problem makes it more manageable and enables learning where it was not possible in a monolithic problem, RL-only approaches are still limited in terms of how complex a problem can be that they successfully address.Hierarchical CBR is a sub-genre of CBR where there are several case-bases organised in an interlinked order. Smyth & Cunningham (1992) coin the term HCBR to describe a CBR system for software design. The authors use a blackboard-based architecture that controls reasoning with a hierarchical case-base. The case-base is structured as a partonomic hierarchy where one case is split into several layers. Finding a solution to a software design problem requires several iterations where later iterations are used to fill lower levels of the case. Eventually, one solution (i.e. one program) is found. The approach presented in this thesis is similar, though slightly more complex. The case-bases of the modules in Chapters 6 to 9 are not strictly partonomic, since lower-level case-bases do not only use solutions from higher level states as input for their case descriptions. Furthermore, the modules in this thesis are more autonomous in that they run on different schedules which can result in differing numbers of CBR-cycle executions for different case-bases.While the approach presented in this thesis is related to HCBR and, to a lesser degree, HRL, the strongest conceptual influence comes from layered learning (LL). The LL paradigm was introduced by Stone (1998) as an architectural concept for an AI agent that manages virtual robots in a robotic soccer simulator. While being less complex, this domain is remarkably similar to RTS games in that is an inherently multi-agent adversarial environment.There are four major principles in LL. These key principles according to (Stone, 1998) are A mapping directly from inputs to outputs is not tractably learnable.A bottom-up, hierarchical task decomposition is given.Machine learning exploits data to train and/or adapt. Learning occurs separately at each level.The output of learning in one layer feeds into the next layer.All of these principles also apply to the hierarchical architecture for RTS game tactical and reactive tasks that is designed as part of this thesis. Point 1 results from the high level of complexity that RTS games exhibit as a problem domain. Point 2 is satisfied both on a high level through the many hierarchical RTS problem decompositions that are used in AI agents and in detail, through the customised task decomposition for the approach presented in Chapter 6. Point 3 is decided through the choice of ML algorithms for the individual layers, in this case hybrid CBR/RL. The second part that requires levels to be learned individually is defined in the original publication by Stone (1998) but, as it basically limits the learning process, has since seen attempts at being modified (Whiteson & Stone, 2003; MacAlpine et al., 2015). Point 4 is again decided through the design of the hierarchical architecture.These high-level principles match the problem addressed in this thesis very well. However, the detailed definition and resulting formalism presented in Stone & Veloso (2000) make the paradigm slightly less suitable. According to these definitions, layered learning is strictly a bottom-up approach where underlying layers feed feature vectors into higher ones. While this is partially the case in the approach presented in this thesis in that CBR/RL modules on higher levels re-use experience on lower ones, there is also an additional behaviour of higher layers feeding problem description parameters into lower-level case-bases. Furthermore, the formalism defining layered learning restricts the layers to working with concrete state-featurevectors, both in terms of input and output and, more importantly, works on a distinct set of training samples. While case definitions in a CBR approach could be expressed as feature vectors, RL, unlike other ML approaches, does not work with training examples.As these discrepancies show, the original formal definition of the LL does not account for all aspects of the approach presented in this thesis. As such, this approach cannot be formally expressed in the LL model. However, the work presented in this thesis can serve to highlight possible extensions to the LL paradigm and opportunities that arise from applying the paradigm the RTS game domain.Reinforcement Learning for Strategy Game Unit Micromanagement This chapter evaluates the capabilities of different reinforcement learning algorithms (R. S. Sutton & Barto, 1998) for the purpose of learning micromanagement in a RTS game. The aim is to determine on the suitability of RL, as a ML technique that enables adaptive AI, for creating a learning agent in StarCraft. To do this, two prominent TD learning algorithms are used, both in their simple one-step versions and in their more complex versions that use eligibility traces. The background in Section 3.3 gave a more detailed account of these algorithms and RL in general.Existing approaches such as the ones by Mici´c et al. (2011) and Shantia et al. (2011) define models that are specific to the tested scenarios. More importantly, there is no evaluation of different RL algorithms for those approaches, which is a central aspect described in this chapter. The task chosen for this evaluation is managing a combat unit in a small scale combat scenario in StarCraft. Subsequent chapters describe the development of a larger hybrid AI solution that addresses the entire StarCraft micromanagement problem, where the agent manages the different layers of problems in a complex commercial RTS game (Buro, 2003b). The overall goal is for the technique to be easily scalable for different types of units and eventually flexible enough to be transferred to different problem domains such as bigger scenarios, other games or even different sets of problems as described in (Laird & van Lent, 2001).The chapter is structured as follows. First, the model that abstracts the in-game information and enables the agent to use RL for learning the micromanagement task is explained in detail. The overall algorithm that uses StarCraft as its simulation environment and RL as its learning technique is then explained. Subsequently, the setup of the empirical evaluation and the results from that evaluation are elaborated on. Finally, those results are discussed and their meaning for further developments is detailed.

READ FOOD SAFETY IN SCHOOL FEEDING PROGRAMMES

Reinforcement Learning Model

The agent in a reinforcement learning framework makes its decisions based on the state s in which an environment is at any one time. If this state signal contains all the information of present and past sensations it is said to have the Markov property. If a RL task presents the Markov property, it is said to be a Markov decision process (MDP). A specific MDP is defined by a quadruple (). RL problems are commonly modelled as MDPs, where S is the state space, A is the action space, are the transition probabilities and represents the expected reward, given a current state s, an action a and the next state s0. The MDP is used to maximise a cumulative reward by deriving an optimal policy π^∗according to the given environment.In order to adapt the RL technique to the given task of micromanaging combat units, a suitable representation has to be chosen. This is especially important since StarCraft is a very complex environment that requires extensive abstraction of the available information in order to enable meaningful learning and prevent an exponential growth of the state- or action space. This would make any meaningful learning within a reasonable time impossible.

Reinforcement Learning States

States are specific to a given unit at a given time and are designed as a quadruple of values consisting of the following.
Weapon Cooldown: Is the weapon of this unit currently in cooldown? (i.e. 2 Possible Values)
Distance to Closest Enemy: Distance to the closest enemy as a percentage of the possible movement distance of this unit within the near future. Distances are grouped into four different groups ranging between <= 25% and > 120% (i.e. 4 Possible Values)
Number of Enemy Units in Range: Number of enemy units within range of this unit’s weapon.
Health: Remaining health of this unit, classified into one four different classes of 25% each (i.e. 0%-25% etc.).(i.e. 4 Possible Values)
This leads to a possible size of the state space
|S| = 32 ∗ (|U_enemy| + 1) where U_enemyis the set of all enemy units.

Reinforcement Learning Actions

There are two possible actions for the units in this model: Fight or Retreat. This set of actions was chosen for several reasons. While there are many more possible actions for an agent playing a complete game of StarCraft, the focus for this task, and therefore for this model, lay on unit control in small scale combat. This excludes many high-level planning actions, such as actions given to groups of units. Many units in StarCraft have abilities reaching beyond standard fighting, so-called special abilities that can be decisive in battles but which are ignored in this setup. However, the core of any army and therefore the core of any fight will always consist of ranged or melee units that have no special abilities beyond delivering normal damage. Using only Fight and Retreat as actions also omits more general actions such as exploring the surroundings or simply remaining idle. Exploration is a crucial part of any StarCraft game but involves many different aspects not directly related to combat and will therefore eventually be handled by a completely separate component of the AI agent. The Idle action was originally part of the model but experimental empirical evaluation showed that this only meant that the built-in unit micromanagement took over, which is actually a mix of both existing actions, Fight and Retreat. Therefore, the learning process was severely impeded by the Idle action receiving a reward signal that was basically a mix of the reward signal of the other two actions.Fight Action The Fight action is handled by a very simple combat manager. The combat manager determines all enemy units within the agent’s unit’s range and selects the opponent with the lowest health which can thus be eliminated the fastest. Should no enemy unit be within weapon range when the action is triggered, nothing will happen until the next action has to be chosen.Retreat Action The Retreat action aims to get the agent’s unit away from the source of danger, i.e. enemy units, as quickly as possible. A weighted vector is computed that leads away from all enemy units within the range they would be able to travel within the next RL time frame. The influence of one enemy unit on the direction of the vector is determined by the amount of damage it could do to the agent’s unit.To avoid being trapped in a corner, on the edge of a map or against an insurmountable obstacle on the map (StarCraft maps can have up to three different levels of height, separated by cliffs) a repulsive force is assigned to these areas. This force is included in the computation of the vector. If an agent’s unit that tries to retreat is surrounded by equally threatening and thus repulsive forces, no or next to no movement will take place.

1 Introduction
1.1 Motivation and Research Questions
1.2 Research Contributions
1.3 Thesis Outline
2 Related Work
2.1 Reinforcement Learning for Computer Game AI
2.2 Case-Based Reasoning and Hybrid Approaches
2.3 Hierarchical Approaches and Layered Learning
2.4 StarCraft as a Testbed for AI Research
2.5 Summary
3 Background
3.1 Real-time Strategy Games and StarCraft as Testbeds for AI Research
3.2 RTS Game Bot Architectures
3.3 Reinforcement Learning
3.4 Case-Based Reasoning
3.5 Layered and Hierarchical Learning
4 Reinforcement Learning for Strategy Game Unit Micromanagement
4.1 Reinforcement Learning Model
4.2 Algorithm
4.3 Empirical Evaluation and Results
4.4 Conclusions on the Future Use of RL for Micromanagement in RTS Games
5 Combining Reinforcement Learning and Case-Based Reasoning for Strategy Game Unit Micromanagement
5.1 CBR/RL Agent Architecture
5.2 Model
5.3 Empirical Evaluation and Results
5.4 Discussion
5.5 Conclusion and Influence on Hierarchical Approach
6 A Hybrid Hierarchical CBR/RL Architecture for RTS Game Micromanagement
6.1 Modeling a Hierarchical CBR/RL Architecture in a RTS Game
6.2 Evaluating the Hierarchical Architecture
6.3 Unit Mapping
6.4 Summary
7 Architecture Level Three: Unit Pathfinding using Hybrid CBR/RL
7.1 CBR/RL Integration and Model
7.2 Similarity Computation and Navigation Module Logic
7.3 Empirical Evaluation and Results
7.4 Navigation Discussion
7.5 Training the Navigation Case-Base
8. Architecture Level Two: Squad-Level Coordination
8.1 Unit Formations
8.2 Unit Attack
8.3 Unit Retreat
8.4 Summary
9. Architecture Level One: Tactical Decision Making
9.1 Tactical Decision Making Model
9.2 Overall Hierarchical CBR/RL Algorithm
9.3 Tactical Decision Making Evaluation
9.4 Results
9.5 Discussion
9.6 Knowledge Transfer between Scenarios
10. Discussion and Future Work
11. Conclusion
GET THE COMPLETE PROJECT
A Multi-Layer Case-Based & Reinforcement Learning Approach to Adaptive Tactical Real-Time Strategy Game AI