Get Complete Project Material File(s) Now! »
Machine Learning for Robot Control
Machine Learning (ML) is a scientific field concerned with building artificial systems that improve their performance over time, based on training examples or past experience [Alpaydin, 2010]. Otherwise stated, the goal of any ML system is to learn how to solve a problem given some data. Mitchell provided a general definition as « A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E. » [Mitchell, 1997]. Different kinds of ML techniques have been applied to automatically build robot behavior.
Families of ML techniques are usually divided in categories, depending on one of two crite-ria: type of data or experience exploited when learning, and type of problem addressed by the technique. These two points of view are somewhat overlapping, since addressing a certain kind of learning problem may require a specific type of data. ML families regarding the type of data or feedback available to the learning system include supervised, unsupervised, semi-supervised, and reinforcement learning.
Supervised Learning
Supervised Learning (SL) [Hastie et al., 2009, Chapter 2] exploits labeled data examples for the system to learn. SL approaches use a training base consisting of a set of data examples along with labels that indicate the expected system output, or ground truth. Each label corresponds to the output that the optimal system should provide when fed a given example as input. These approaches are used in classification, prediction and regression problems. SL techniques include Artificial Neural Networks with the backpropagation algorithm [Rumelhart et al., 1986], decision trees [Rokach and Maimon, 2005, Chapter 9], learning classifier systems [Lanzi et al., 2003], and Gaussian Processes [Rasmussen and Williams, 2006], among others.
One of the main issues with SL approaches in robotics is that they are very data-demanding. Consequently, complete end-to-end robot control systems are not usually learned using SL tech-niques, because it is not generally possible to provide data examples of the right action to perform in every situation. A promising research direction in ML for robotics consists in developing data-efficient algorithms that have fewer data requirements, and maximize the use of the limited amount of available data [Deisenroth et al., 2015,Mouret, 2016]. Some ML approaches try to limit such data requirements, for example by using unlabeled data, which is less expensive to acquire.
Unsupervised Learning
Unsupervised Learning (UL) [Hastie et al., 2009, Chapter 14] exploits unlabeled data examples for the system to learn hidden patterns or intrinsic structure of the data. UL techniques use a training base consisting of a set of unlabeled data examples, representative of the distribution of inputs of the problem. For instance, these data examples may be images or robot motion patterns that the system needs to categorize. UL techniques typically address clustering, dimensionality reduction, and density estimation problems. UL approaches include k-means [Lloyd, 1982], self-organizing maps [Kohonen, 1998], and density approaches [Ester et al., 1996], among others.
Unlabeled data exploited by UL algorithms is relatively inexpensive to acquire. However, it is also less informative than labeled data, which limits the possibilities of UL approaches. Therefore, semi-supervised learning has been proposed to increase such possibilities, which relies on enhancing a large unlabeled dataset with a smaller set of labeled data.
Semi-Supervised Learning
Semi-Supervised Learning (semi-SL) [Chapelle et al., 2010] exploits both a large set of unlabeled data examples, and a smaller set of labeled data for the system to learn, thus falling in between SL and UL. Labeling data for SL may be very costly, and it may be unfeasible to obtain a fully labeled dataset. However, acquiring unlabeled data is relatively inexpensive. As such, there has been effort in developing semi-SL techniques that can exploit both a small set of labeled data, and a larger set of unlabeled data. Typically, semi-supervised techniques rely on the assumption that data points that are close to each other are likely to share a label. As such, they may use Bayesian learning to automatically infer labels, based on similarities with respect to labeled data [Cozman et al., 2003]. Such techniques address similar problems to those in SL, i.e. classification, prediction, and regression.
Applications of the aforementioned families of ML approaches (SL, UL, semi-SL) in robotics are mainly found in submodules of robot control systems. In the next section, we introduce Reinforcement Learning (RL), which is a family of methods that learn end-to-end control functions, by addressing their design as sequential decision-making problems. We provide a classification of RL subfamilies, while listing some of its applications in robotics. Finally, we situate the evolutionary approaches used in our work within the class of direct policy search RL algorithms. Evolutionary approaches to learn robot control are further detailed in Chapter 3.
RL for Multiple Robots
Regarding multirobot teams and swarms, learning approaches are less common than in the single-robot case. Most techniques fall in one of two general approaches. On the one hand, approaches based on solving Decentralized Partially-Observable Markov Decision Processes (Dec-POMDPs) [Oliehoek and Amato, 2016] learn a decentralized joint policy for a multirobot team. This is typically done by using critic-based approaches that compute a value function for the team of robots, while addressing the Credit-Assignment Problem (CAP), i.e. crediting in an adequate manner each robot in the team for their respective contributions to the goal. Solving Dec-POMDPs in an exact manner is highly expensive. As such, approximative algorithms have been proposed to address realistic applications [Dibangoye et al., 2009]. Examples of application of Dec-POMDPs in robotics include multirobot box-pushing and package delivery [Amato et al., 2015], multirobot aggregation [Eker et al., 2011], and cooperative bartender-and-waiter tasks [Amato et al., 2016].
On the other hand, gradient-free policy search methods, especially with population-based metaheuristics such as EAs or PSO, have also been used to learn multirobot and swarm robot behavior. These approaches generally find a set of high performing policies using approximative approaches, and are less computationally expensive than Dec-POMDPs. Examples of such appli-cations of PSO include navigation with obstacle avoidance [Pugh and Martinoli, 2006, Di Mario et al., 2013], exploration of unknown environments [Wang et al., 2016], and search-and-rescue applications [Couceiro et al., 2013]. As said before, in our work we use Evolutionary Algorithms (EAs) to learn swarm robot behavior, which are direct policy search approaches. In our case, EAs may be seen as black-box optimization methods to learn swarm robot control. Evolutionary Robotics (ER), i.e. using EAs to learn robot behavior, is discussed in Chapter 3. In that chapter, we further develop our discussion on evolutionary approaches to learning swarm robot control, particularly on EER methods, the specific class of EAs used in our work for decentralized swarm robot behavior adaptation.
Lifelong Machine Learning
As said before, TL focus is set on learning target tasks, while the source tasks are seen as stepping stones to ease such a learning. On the other hand, LML [Silver et al., 2013] approaches focus on incrementally learn sequences of tasks, while retaining previous tasks and exploiting (transferring) previous knowledge to new tasks as they are progressively presented to the system. The training data is provided to the system in a sequential order (as in single-task online ML), and the goal is to learn, retain and accumulate knowledge on multiple tasks while leveraging previous learning (as in MTL). As such, LML may be considered as incremental online MTL [Chen et al., 2016], since the goal is to learn and share knowledge among a set of tasks, while addressing them online, i.e. when they become available to the system. A more precise definition, from [Silver et al., 2013], states: « Lifelong Machine Learning, or LML, considers systems that can learn many tasks over a lifetime from one or more domains. They efficiently and effectively retain the knowledge they have learned and use that knowledge to more efficiently and effectively learn new tasks. »
Otherwise stated, LML focus on continuous learning of tasks in sequence. The corresponding system learns a first task, exploits (transfers) this knowledge when learning a second one, and incrementally continues to learn new tasks without losing previous ones. The central questions to develop LML algorithms are (a) how to transfer knowledge from previously acquired tasks, and (b) how to retain previously acquired knowledge (or how to avoid forgetting). In this sense, LML is concerned with a continuous process of learning, where knowledge from previous tasks is consolidated and transferred to improve learning of new tasks. Other names for these approaches are learning to learn [Thrun and Pratt, 1998], or never-ending learning [Mitchell et al., 2015]. In LML, different approaches exist to help progressively retaining and exploiting previous learning in an open-ended manner.
In explanation-based ANNs [Mitchell and Thrun, 1993,Thrun and Mitchell, 1995], backpropa-gation is used to learn the parameters of an ANN robot controller that runs an RL algorithm. The algorithm progressively builds task-independent transition models over time, based on previous evaluations. The partial derivatives on the value function, computed for the backpropagation algorithm, are extracted and stored. Both transition models and the partial derivatives are then transferred when learning new tasks to facilitate their learning.
In LML based on MTL [Silver and McCracken, 2003,Silver and Poirier, 2004,Silver, 2013], MTL systems with representational sharing (cf. above) are adapted for sequential learning, instead of simultaneous learning. In their work, the authors focus on techniques to sequentially consolidate and retain previous knowledge to avoid erasing it. These techniques include using a long-term ANN that retains knowledge from previous tasks, and rehearsal mechanisms that retrain the learning system using previous data. Knowledge transfer is intrinsically done by shared representations, as in MTL. In [Poirier and Silver, 2005], the authors investigate the effect of task order on the retention of previous knowledge, concluding that such an order especially influences the performance of the system during the first tasks.
In the Efficient Lifelong Learning Algorithm (ELLA) [Ruvolo and Eaton, 2013b], a sparsely shared basis representation for all tasks is maintained, so new tasks can exploit it. Knowledge is transferred to new tasks through this shared representation, which is explicitly retained, and progressively refined over time to maximize performance over all tasks. The sparseness of the representation helps avoiding negative interference between unrelated tasks. The authors show that ELLA yields similar performance as compared to simultaneous learning of the entire set of tasks, with a dramatic increase in learning speed, over 1000 times faster. In [Ruvolo and Eaton, 2013a], the authors extend the algorithm to actively select task order over a subset of currently available tasks so as to increase performance on future tasks. In [Ammar et al., 2014], the authors extend ELLA to learn policies for sequential decision-making problems using policy gradient RL methods. Their approach has robust theoretical guarantees, and they experimentally show a large increase in learning speed.
It is worth mentioning the work by Silver et al. [Silver et al., 2013], a survey and position paper where the authors review previous work on LML, and argue for a systems approach to LML and ML in general. The authors describe effective (performing) and efficient (fast) knowledge retention and knowledge transfer as the goal of LML systems. They further sketch in general terms the essential components, and a reference framework for LML systems: universal and domain knowledge bases, as well as an inductive learning system that extract relevant knowledge from the bases. Such a system can be then coupled to a ML system that exploits available knowledge, and that provides feedback for adequate retention and consolidation when populating the universal and domain knowledge bases.
In the next section, we discuss the problem of forgetting when applying incremental lifelong adaptation to sequences of tasks in ANNs. This is directly linked to our work on swarm of robots that collectively adapt in an continuous manner to possibly changing environments or tasks.
Dynamic Optimization Perspective
DOPs are problems where the goal is not only to find the optimum of a given function, but also to track it over time as the problem changes. In DOPs, the focus is not set on retaining old optima, but on efficiently and rapidly adapting when changes occur. As such, the problem does not really concern learning or retaining several tasks. However, when using evolutionary approaches for DOPs, which is known as Evolutionary Dynamic Optimization (EDO) [Nguyen et al., 2012], several algorithms have been proposed that maintain in the population some candidate solutions that were good in the past. The goal of such approaches is to maintain a high level of diversity, with promising solutions gathered in the past, to rapidly and effectively readapt to environmental changes, especially to cyclic ones. In our work, we take a similar point of view, but our focus is set on limiting task forgetting by promoting the retention of solutions that were good in the past. Such a focus is somehow different to usual approaches for DOPs, which rather focus on rapidly adapting to environmental changes.
Concretely, the proposed approach in Chapter 8 tries to maintain individuals with high performance in previous tasks in the population of the EA, while adapting to new tasks. As such, the forgetting problem is reduced at the level of the population: distinct subpopulations focus on either optimizing performance on the current task, or retaining previous tasks, with knowledge on the different tasks spread through the global population. By defining the problem of forgetting at the populational level, our vision builds a bridge between the problem of task retention in ML, and populational approaches in EDO to cope with task changes.
Table of contents :
1 Introduction
1.1 Context
1.2 Autonomous Robots, Agents, and Swarms
1.3 Outline of this Thesis
Part I Methodological Background
2 Machine Learning and Adaptation
2.1 Machine Learning for Robot Control
2.2 Offline Learning vs. Online Adaptation
2.3 Adaptation to Several Tasks
2.4 Conclusion
3 Evolutionary Robotics
3.1 Evolutionary Algorithms for Robot Control
3.2 Evolutionary Swarm Robotics
3.3 Embodied Evolutionary Robotics
3.4 Conclusion
Part II Distributed Adaptation of Swarm Robot Behavior
4 Approach for Swarm Robot Adaptation
4.1 Baseline Algorithm: vanilla EE
4.2 Adaptation to Changing Conditions
4.3 Research Questions and Contributions
5 Influence of Selection Pressure in Distributed EER
5.1 Exploration vs. Exploitation Dilemma
5.2 EER Algorithm and Selection Operators
5.3 Experiments
5.4 Conclusion
6 Collaborative Item Collection using Embodied Evolution
6.1 Evolving Collaborative Behavior
6.2 Collaborative Item Collection
6.3 Experiments
6.4 Results and Analysis
6.5 Conclusion
7 Augmenting Neural Controllers in Embodied Evolution
7.1 Motivation
7.2 Evolution of Neural Topologies
7.3 odNEAT with Logical Markers
7.4 Experiments
7.5 Conclusion
8 Evolutionary Adaptation to Changing Conditions
8.1 Adaptation to Changing Conditions
8.2 Forgetting-Avoidance Evolutionary Algorithm
8.3 Preliminary Experiments
8.4 Toward Forgetting Avoidance in Distributed EER
8.5 Conclusion
9 Conclusion and Perspectives
9.1 Summary
9.2 Contributions
9.3 Perspectives
List of Publications
Bibliography