Policy Search when Significant Events are Rare: Choosing the Right Partner to Cooperate with

Get Complete Project Material File(s) Now! »

Kin selection and indirect fitness benefits

Kin selection is a first evolutionary mechanism that explains that cooperative behaviours can be adaptive (Hamilton, 1964). Indeed, the relevant unit for the propagation of a trait is the gene in all of its copies, not the individual by itself (see section 1.1.1). A gene can increase its propagation by improving the fitness of the individual carrying it, but also by increasing the fitness of other individuals carrying the same gene. Thus, a gene that helps its carrier’s siblings or oﬀspring increases the fitness of related individuals. It is likely that these relatives also carry a copy of this gene since they are from the same parents. Therefore, this gene indirectly increases its fitness. Thus, a mutant gene that helps relatives of an individual compared to a resident gene that does not help them is favoured by natural selection.
However, kin selection can only explain the existence of a subset of the cooperative behaviours observed in the living world, those expressed towards genetically related individuals. How could cooperative behaviours between unrelated individuals, as observed in humans or vampire bats, for example, or cooperative behaviours between individuals of diﬀerent species, as observed in symbioses, have been favoured by natural selection?

Mutualism

For cooperation between genetically unrelated individuals to be evolution-arily stable, the actor must itself receive a benefit from its cooperative be-haviour. This benefit can be obtained if others respond to its cooperative behaviour later by cooperating back (Trivers, 1971).
This mechanism is called conditional cooperation or reciprocity. There are two main families of conditional cooperation mechanisms: Partner Con-trol (also called Partner Fidelity Feedback) and Partner Choice (Noë, 2006; Sachs et al., 2004). In Partner Control, the recipient of an interaction adjusts its behaviour towards the same actor and continues to interact with it. Reci-procity can be either positive, i.e. the recipient cooperates with the actor in response to the cooperation, or negative, i.e. the recipient punishes the actor if the actor does not cooperate. In both cases, it is in the actor’s interest to cooperate, since this maximizes its gain from the recipient’s response. Partner Control behaviours can be implemented very easily and are par-ticularly robust, as shown in Axelrod and Hamilton (1981). Thus, in a coop-erative situation which can be modelled as a Prisoner’s Dilemma (detailed in section 1.2), with repeated interactions, the tit-for-tat behaviour is a robust and straightforward reciprocity strategy. The tit-for-tat strategy is an op-timistic imitation strategy. It consists of always starting any interaction by cooperating and then imitating its partner during the following time steps. Thus two individuals playing this strategy will cooperate at every time step of the interaction. On the other hand, when an individual playing tit-for-tat interacts with a cheater, it is exploited only once, and then it stops cooper-ating.
Tit-for-tat behaviours have been observed in pied flycatchers (Krams et al., 2008), who only come to defend partners who have defended them before, and refuse to help those who have not come to help them when they needed it. They are also seen in wild vervet monkeys for grooming (Fruteau et al., 2009).

Partner Choice and Biological Markets

In partner choice, on the other hand, individuals do not only adjust their cooperation with a given partner according to his past action. They choose their partner according to their past action. Since all individuals in the population seek to be with the best possible partner and not all can meet their demand, there is a biological market of partners (Noë & Hammerstein, 1994).
We observe partner choice processes in a wide variety of living systems, from the cleaner fishes (Bshary & Grutter, 2002) to the legumes-rhizobia mutualisms (Simms & Lee Taylor, 2002) that we will develop further. In the human species, partner choice has likely played a prominent role in the evolution of cooperative behaviour (Barclay, 2013; Barclay & Willer, 2007; Debove, André, et al., 2015).
Thus, partner choice allows the appearance and maintenance of coopera-tion. To understand this, let us no longer focus on the actor of cooperation, but on the recipient. In a collective task, it is always relevant for the recip-ient to interact with the best possible actor, i.e. the actor who will enable it to obtain the biggest gain. Since in order to perform this collective task, the actors need the recipients, it is then in the interest of the actors to be as cooperative as possible to be picked. This pressure is all the stronger if the number of actors is particularly large for the number of recipients. The actors and recipients are in a supply and demand setup, which can be studied in the form of a market (Noë & Hammerstein, 1994).
Let us consider a population of individuals who are looking to interact with the best possible partner. In this population, a mutant appears who co-operates more than the others. The other individuals will particularly desire to interact with this mutant. The mutant will therefore be involved in a lot of interactions and obtain many gains. As a result, the mutant can be picky. It will be able to refuse interactions with the least eﬃcient individuals in or-der to choose the most eﬃcient partners. There is therefore an “assortative matching”. That is to say that individuals will be matched according to their performance. The best performing individuals will be able to aﬀord to be picky and will end up being paired with other well performing individuals, and vice versa the worst-performing individuals will pair up together(Geof-froy et al., 2018). Therefore, the best performing individuals who interact together will receive benefits from their high level of cooperation. That is, assortative matching generates a selective pressure in favor of cooperation.
For example, Bshary and Grutter (2002) shows that cleaners fishes and their clients cooperate in a market structure. Cleaner fishes are small fishes that eat the parasites present on “client” fishes. Cleaner fishes have “cleaning stations”. They always stay in the same area. When clients want to be cleaned, they go to these stations. The clients can select which station they go to. Depending on the supply of cleaners and the demand of clients, the market can achieve diﬀerent balances in favour of the clients or the cleaners: If there are fewer stations than necessary to meet the total demand of the clients, then the clients are in an unfavourable situation, it is diﬃcult for them to access a station. The cleaners take advantage of this situation: They allow themselves not only to eat the parasites present on the clients but also to eat their mucus tissues, which are very nutritious for the cleaners. The cleaners are cheating. Since the clients have no other options, they can only comply. On the contrary, when there are more stations than necessary to accommodate all the clients, there is more supply than demand, and the clients are in an advantageous situation. The cleaner fishes do not eat the mucous membranes of the clients, because if the clients are not satisfied with the service provided, they can go to another station next time.
The mechanism is similar in the legume-rhizobia mutualism (Simms & Lee Taylor, 2002). Legumes need nitrogen that they cannot capture from the air. Many bacteria in the soil, the rhizobia, release nitrogen elements that the plant can capture. The rhizobia also need the help from the legume because they consume carbon elements that the plants produce. Thus, legumes create in their roots nodules that host and supply carbon elements to the bacteria. Ineﬃcient nodules, where the bacteria produce little nitrogen, are destroyed and deprived of carbon, while eﬃcient nodules are maintained and supplied with carbon. There is a market eﬀect, and partner choice develops. The plant hosts and provides resources only to the bacteria that oﬀer nitrogen in exchange.
Note that partner choice can be implemented in many diﬀerent ways, varying in complexity and eﬃciency. For example, partner choice can be achieved through direct information. The individual looking for a partner, the chooser, uses its knowledge of the diﬀerent partners (Aktipis, 2004, 2011; Debove, Baumard, et al., 2015; McNamara et al., 2008). If the chooser uses only the information from the current partner, this partner choice is called partner switching. It can be worded as a simple rule: If the current partner cooperates, the chooser stays with it; otherwise it switches to a new partner at random (Aktipis, 2011; Bshary & Grutter, 2005). The chooser can also use a memory of all past interactions with its partners to pick the best partner available directly. Partner choice can also be made through indirect information. The chooser picks a partner based on the partners’ past interactions with other individuals. This knowledge can come from direct observation or reported information from other individuals.

Why isn’t cooperation everywhere?

Although we wondered at the beginning of this chapter how cooperation could evolve, after studying the diﬀerent mechanisms that could support it, it is now the opposite question that emerges. Why is reciprocal cooperation relatively rare in nature? Indeed, all examples of reciprocity in animals are contested (reviewed in part in Carter, 2014), and yet partner choice is an incredibly powerful mechanism in Humans (Barclay, 2013; Barclay & Willer, 2007; Debove, André, et al., 2015). What factors might prevent the emergence of reciprocity?
First of all, a substantial problem is the bootstrapping issue (André, 2014). While it is easy to understand how reciprocity mechanisms can main-tain cooperative behaviours, it is more complicated to explain how this mech-anism can appear by itself. Indeed, reciprocity — be it in the partner control or in the partner choice version — requires two mutually dependent traits, both unstable by themselves: it requires that (i) the actor can cooperate and that (ii) the recipient can recognize and respond to an act of cooperation.
Without the simultaneous presence of these two traits, reciprocity cannot take place, and cooperation is not evolutionarily stable. Indeed, the ability to distinguish a cooperative partner from a cheating partner only makes sense if there are both cooperative and non-cooperative individuals in the recipient’s vicinity. If the individual’s neighbourhood consists only of cheaters (or only of cooperators), then there is no benefit in maintaining a complex system of cooperator recognition. Similarly, cooperative behaviours have no reason to be maintained by natural selection if there is no individual able to recognize and respond con-ditionally to them. Indeed, cooperation is evolutionarily stable only if paying a short-term cost makes it possible to change the recipient’s future behaviour. If the selected recipient does not have the competence to distinguish a coop-erator from a cheater, its behaviour cannot change. Therefore, there cannot be any interest in cooperating.
These two traits, which are both complex and diﬀerent, can only be fa-vored together. However, it is extremely improbable that these two traits will appear at the same time in a population. One solution to overcome this gap is that either of these behaviours already existed at least partially in the population for other reasons. For example, one hypothesis to allow the emer-gence of cooperation between unrelated individuals is that the cooperation implemented by kin-selection can sometimes be applied between non-kin by misfiring. Another hypothesis is based on the role of byproduct cooperation as a triggering factor (André, 2015).
Beyond this bootstrapping problem, however, other constraints influence the evolution of cooperation by partner choice. Partner choice requires the presence of numerous and accessible outside options so that comparing dif-ferent partners is viable (Chade et al., 2017; Debove, Baumard, et al., 2015; Raihani & Bshary, 2011). If it is too costly for an individual to find a better partner compared to the gain obtained with their current partner, it is not advantageous to be choosy. We will develop this point further in Chapter 2 and show that it could play an important role in the phylogenetic distribution of cooperation. In Chapter 3, we explore the possibility of the emergence of cooperative behaviours by partner choice in pseudo-realistic environments, studying the impact of these emergence issues.

READ Traditional methods of expulsion and rendition under international law

The biological market in a spatialised environ-ment

Partner choice models are mainly done in aspatial environments. In these en-vironments, there is no notion of distance or proximity between individuals. Individuals are either randomly paired and separated to join a « pool » of single individuals, or they all interact with each other with diverses resource distri-bution systems. In a spatial environment, the search for a partner requires one to move in order to reach other individuals. Although models of partner choice in aspatial environments show that straightforward behavioural rules are suﬃcient to implement partner choice, it is tempting to think that in spatial environments, behavioural rules need to be much more complicated.
Aktipis (2011) shows that even in a spatial environment, it is possible to change cooperative behaviours through partner choice with elementary cognitive mechanisms. The behavioural rule that they call Walk Away allows the emergence of partner choice and could develop in many setups.
Aktipis (2011) proposes a model of partner choice in a spatial world con-stituted of a grid of cells, where individuals use their travel ability to choose the best group of partners. The model is similar to that of McNamara et al. (2008) but individuals are no longer paired randomly; they are paired based on their proximity. All individuals on a same cell play together. Once individuals have interacted, each individual has the option of staying or leav-ing (walking away) depending on the proportion of cooperators present in their cell compared to their satisfaction threshold. This satisfaction thresh-old is fixed for the whole population during the whole simulation, but the proportion of cooperators and cheaters varies according to an evolutionary algorithm. The model shows that when the satisfaction threshold value is high, then the population stabilizes towards a predominance of cooperators. If the threshold value is low, then cooperators are exploited, and cheaters invade the population.
Aktipis (2011) thus presents an excessively simple behavioural rule in spatial environments that allows a partner choice in a population leading to the evolution of cooperation. The fact that individuals navigate in a com-plex environment does not necessarily imply that the cognitive mechanisms necessary for partner choice are very elaborate.

Evolutionary Robotics as an Individual-based modelling method for the evolution of cooperation

As seen previously, results presented before have been obtained with rather abstract models, especially when it comes to capturing the mechanistical con-straints of the « real » world. Most models do not capture how the individuals actually move around, meet with potential partners and find resources. Even when they do consider spatial environment, such as Aktipis (2011), they do so in a much simplified form such by using grid-based 2D environment and fixed behavioural strategies.
In this thesis, we are interested in how Partner Choice benefits the evo-lution of cooperation under more realistic constraints. In particular, the probability of possible interactions (whether successful or not) depends on resources availability, population density and exploration strategies. There-fore, we design individual-based model that capture the complex interactions at work in a pseudo-realistic 2-dimensional environment, where individuals learn to explore and interact, using realistic sensory inputs and actuators (i.e. continuous states and actions). This entails to also consider more complex decision-making apparatus, with two possible outcomes whether results ob-tained with more simplistic models may not hold anymore, or whether they do hold but must be further specified (e.g. by taking into account the fact the individual and resources are two diﬀerent things).
In this Section, we present evolutionary robotics, which is the application of evolutionary computation to robotics. We present the various ways in can be used to enable swarm or collective robots to learn how to solve a task. We then present how the very same method can be, and has already been, used to tackle open questions in evolutionary biology, including the evolution of cooperation. When it comes to modelling for evolutionary biology, Evo-lutionary Robotics thus presents a ready-to-use method for individual-based modelling, where mechanistic contraints can be modelled as robotic agents move in pseudo-realistic 2-dimensional environment. This makes it possi-ble to study how physical constraints imposed by the environment and the robotic agents may shape the evolutionary dynamics of learning to cooper-ate. And in our particular case, how partner choice may aﬀect the evolution of cooperation in more complex setups.

Table of contents :

Introduction
1 Evolution of Cooperation and Partner Choice
1.1 The Evolution of Cooperation
1.1.1 Evolutionary approaches to behaviour
1.1.2 The Problem of Cooperation
1.1.3 Kin selection and indirect fitness benefits
1.1.4 Mutualism
1.1.5 Partner Choice and Biological Markets
1.1.6 Why isn’t cooperation everywhere?
1.2 Models for the Evolution of Cooperation
1.3 Models of Partner Choice
1.3.1 Population Diversity
1.3.2 The biological market in a spatialised environment
1.3.3 Competitive Helping
1.3.4 Partner choice with memory
1.3.5 Seeking Time and Interaction Time
1.3.6 Discussion on partner choice modelling
1.4 Adaptative Swarm Robotics
1.4.1 Evolutionary Robotics and Collective Systems
1.4.2 Evolutionary robotics as a Method to Understand Cooperation in Nature
1.5 Thesis objective
2 Nothing better to do? Environment quality and the evolution of cooperation by partner choice
2.1 Introduction
2.2 Methods
2.2.1 The decision-making mechanisms
2.2.2 Phenotypic variability of cooperation
2.2.3 The payoff function
2.2.4 The evolutionary algorithm
2.3 Results
2.3.1 Cooperation cannot evolve when patches are scarce
2.3.2 Cooperation cannot evolve when there are too many partners around
2.3.3 Analysis of the behaviour of “patch ranking” networks
2.4 Discussion
2.5 Supplementary Materials
3 Learning to Cooperate in a Socially Optimal Way in Swarm Robotics
3.1 Introduction
3.2 Methods
3.2.1 Environment
3.2.2 Payoff function
3.2.3 Partner Choice
3.2.4 Robotic Behaviors
3.2.5 Controller and Representation
3.2.6 Learning
3.3 Results
3.3.1 Experimental setup
3.3.2 Learning Cooperation and Population Size
3.3.3 Learning Cooperation and Interaction Length
3.3.4 Effect of Mutation Strength (Control)
3.3.5 Population Size vs Generations (Control)
3.3.6 Wandering and Relocation (Control)
3.4 Conclusion
3.5 Supplementary Materials
4 Policy Search when Significant Events are Rare: Choosing the Right Partner to Cooperate with
4.1 Introduction
4.2 Methods
4.2.1 Learning with Rare Significant Events
4.2.2 Partner Choice and Payoff Function
4.2.3 Behavioural Strategies
4.3 Parameter Settings and Algorithms
4.3.1 Proximal Policy Optimization
4.3.2 Covariance Matrix Adaptation Evolution Strategy
4.4 Results
4.4.1 Learning with always significant events
4.4.2 Learning with rare significant events
4.4.3 Analysing best policies for partner choice
4.5 Concluding Remarks
4.6 Supplementary Materials
4.6.1 Detail analysis of the agents’ reward
4.6.2 Re-evaluation performance statistical score
4.6.3 Timing
5 Conclusion
5.1 Summary
5.2 Discussion and Perspectives