Get Complete Project Material File(s) Now! »
Signal detection theory and metacognition
One problem with measuring confidence is that, like cognitive responses, metacognitive responses can be prone to biases. For instance, some participants might be more inclined to report high confidence (i.e., high probability of success) than others, independently of their underlying capacity to estimate their probability of success. Such biases can have multiple origins (e.g., personality traits, task context…), and obviously contaminate the relationship between confidence and accuracy. When assessing metacognitive sensitivity, it is therefore important to cancel out the effects of metacognitive biases. One way to do this is has been to apply Signal Detection Theory (SDT)(Green and Swets, 1966) to type-2 data (Clarke et al., 1959; Kunimoto et al., 2001; Galvin et al., 2003; Fleming and Lau, 2014).
In SDT, decisions in a detection task are assimilated to setting a criterion to optimally differentiate between two internal distributions, one being the signal and the other the noise (see Figure 8) (Green and Swets, 1966). Crucially, this theory allows distinguishing between the sensitivity (the distance between the signal and the noise) and the bias (where the criterion is set). This is achieved by computing separately the probability of giving each types of response given the stimulus. The hit rate corresponds to the probability of responding that there was some signal when
there was indeed some signal (hits over hits plus misses). Conversely, the false alarm rate represents the probability of responding that there was some signal when there was actually no signal (false alarms over false alarms plus correct rejections). SDT can similarly be used to model type-2 responses. In this framework, confidence judgments can directly be modelled by setting a second criterion on the same distribution (Ko and Lau, 2012), or by computing the distance between the criterion and the distribution (Balakrishnan and Ratcliff, 1996; Kepecs et al., 2008)(see Figure 8).
Applied to type-2 data, the hit rate then becomes the probability of giving a high confidence rating when being correct. Conversely, the false alarm rate corresponds to the probability of giving a high confidence rating when being incorrect (Galvin et al., 2003; Fleming and Lau, 2014). schematic representation of SDT for a detection task where signal has to be detected in noise.
After obtaining hit and false alarm rates, parametric d’ analysis can be performed on type-1 data, because the distribution of responses generally approximate a Gaussian distribution with equal variance. However, type-2 distributions do not respect the assumptions of normality required for parametric analysis when type-1 probability function does, as is arguably often the case (Galvin et al., 2003). Thus, non-parametric analysis based on ROC curves should be preferred for secondorder judgments (Galvin et al., 2003; Fleming and Lau, 2014). Type-2 ROC curves are obtained by plotting for each confidence level the cumulative probability to observe a correct response (type-2 hit rate, y-axis) against the probability to observe an incorrect one (type-2 false alarm rate, x-axis) (see Figure 9 below). The area between the diagonal and the ROC curve then reflects metacognitive sensitivity, independently of the propensity to give high or low confidence judgments. In particular, a curve departing upward from the diagonal represent a higher probability in being correct as confidence ratings increase response (right portion of the decision) are depicted for correct (black) and incorrect (grey) responses.
Right: example of a type-II ROC curve. Given the second-order distribution, it is possible to estimate what should have been first-order distribution if the observer had perfect metacognitive abilities.
One problem with this measure, is that it has been shown to depend on both type-1 sensitivity and bias (Galvin et al., 2003; Maniscalco and Lau, 2012; Fleming and Lau, 2014). To take into account this last aspect, Maniscalco and Lau proposed to systematically model the relationship between first-order and second-order performances to obtain a “pure” measure of metacognitive sensitivity (Maniscalco and Lau, 2012). Galvin and colleagues showed that the parameters of the type-1 distribution determine the parameters at the type-2 level (Galvin et al., 2003). One implication is that type-2 distributions can be used to infer what should have been the underlying type-1 distribution if the observer was able to estimate his first-order performance perfectly (see Figure 9 above). This allows computing the “optimal” d’ given the observed type-2 data. This value, referred to as meta-d’, is equivalent to a projection of type-2 data in the type-1 space, and can therefore be compared to the observed d’ (i.e., a meta-d’ equal to the observed d’ means that the observer perfectly assessed his own performance) (Maniscalco and Lau, 2012). Meta-d’ is also convenient because it directly allows comparing metacognitive sensitivity independently of d’ across various tasks or contexts, by computing the ratio or the difference between meta-d’ and d’ (i.e., metacognitive efficiency) (Fleming and Lau, 2014).
On a more theoretical side, note that the type-2 SDT assumes that type-1 and type-2 judgments are based on the same continuum of information (Galvin et al., 2003). This aspect has several implications. First, second-order performances should follow first-order performances in such a way that, as soon as a signal allows better than chance first-order performances, it should also allow second-order judgments to be better than chance. Second, type-2 sensitivity should always be inferior to type-1 sensitivity. These are predictions derived from mathematical models.
Are they confirmed by empirical data?
Confidence judgments and internal uncertainty monitoring
Coming back to Peirce and Jastrow original experiment, let’s now ask whether type-1 and type-2 performances actually correlate in the real world. Not surprisingly, these authors observed that as the ratio between the two weights decreased, both discrimination performance and decision confidence decreased. In other words, confidence judgments and objective performances both covaried with task difficulty. Importantly, this finding has been replicated many times in controlled settings (Yeung and Summerfield, 2012; Fleming and Frith, 2014).
As for error detection, the first question arising from these observations is to what extent confidence judgments rely on an internal evaluation of the decision? Indeed, in many cases what is actually observed is a correlation of both performances and confidence judgments with task difficulty. In principle, confidence judgments could therefore equally be based on external cues of uncertainty, or on an internal evaluation of the first-order decision. For instance, in Peirce and Jastrow’s experiment, confidence judgments could directly rely on an estimation of the ratio of the weights, since a low ratio generally leads to poor performances. Alternatively, they might rely on an internal evaluation of performances, following for instance a mechanism equivalent to what is proposed by the type-2 SDT mentioned above. In natural settings, it might be the case that people estimate confidence by relying on external cues (Koriat and Ackerman, 2010a; Koriat, 2012; Patel et al., 2012; Reyes and Sackur, 2014). Yet, there is now considerable evidence that people can also estimate decision confidence internally (Barthelmé and Mamassian, 2010; Fleming et al., 2010; Saunders and Vijayakumar, 2012; Reyes and Sackur, 2014). For instance, Barthelmé and Mamassian showed that observers’ confidence closely follows their performances, rather than environmental cues (i.e., stimulus ambiguity) (Barthelmé and Mamassian, 2010). Thus, when the contribution of external cues is accounted for, people can still flexibly and reliably estimate the uncertainty associated with their decisions. Naturally, the following question is: what are the mechanisms underlying the flexible computation of decision confidence?
Theoretical models of decision confidence
Many accounts have been proposed, and there is currently considerable disagreement in the literature as to which ones are better able to explain the flourishing amount of data in this rapidly growing field. A major distinction can be drawn between models in which the very same information is thought to allow both first-order and second-order judgments on the one side, and models proposing that confidence judgments are computed separately from the first-order decision on the other side (Yeung and Summerfield, 2012). The first, «intrinsic» class of models proposes that uncertainty is inherently encoded in the decision-making process itself. By contrast, the second, «metacognitive» class of models proposes that confidence judgments rely on a separate evaluation of the just made decision. A second, related, distinction can be drawn between «decision locus» models and «post-decisional locus» models (Carroll and Petrusic, 2006; Yeung and Summerfield, 2012). In the first class of models, confidence is supposed to be computed at the same time as the
formation of the choice. By contrast, the second class of model assumes that post-decisional processing plays a crucial role in the formation of decision confidence. «Decision locus» models are obviously related, although not exactly equivalent, to «intrinsic» models. Similarly, «post-decisional locus» models generally assume that the computation of decision confidence involves separate «metacognitive» mechanisms.
An example of a strict «decision locus» model is the type-2 SDT theory mentioned above. We have seen that type-2 SDT has proven useful in providing unbiased and independent measures of metacognitive sensitivity. Yet, it is hard to see how this model in itself might be derived into biologically plausible implementations of decision confidence. In particular, because it is totally static by definition, it cannot explain temporal aspects of decision making (Pleskac and Busemeyer, 2010; Yeung and Summerfield, 2012; Kiani et al., 2014). In addition, it is quite descriptive in nature, and might thereby lack the explanatory power that would be conferred by more mechanistic model (Timmermans et al., 2012). As a consequence, dynamic extensions of type-2 SDT have been proposed, which rely on accumulation models of decision-making (Link and Heath, 1975; Vickers, 1979; Ratcliff and Smith, 2004). Before describing how these models describe confidence, and account for type-2 data, I need to briefly introduce accumulation models of decision making, and describe how they account for the formation of first-order decisions.
Table of contents :
Foreword
Chapter One: What is metacognition
Part I – Definition
Part II – Validity of Introspective reports
Part III – Implicit and explicit aspects of metacognition
Part IV – Metacognition versus mindreading
Chapter Two: Empirical studies and computational accounts
Part I – Human adults
1 – Metamemory
2 – Error detection and correction
3 – Decision confidence
Part II – Animals
1 – Behavioural experiments
2 – Neurophysiological data
Chapter Three: The development of Metacognition
Part I – Slow and effortful development of explicit metacognition
1 – Introspective reports, metacognitive knowledge and theory of mind
2 – Metamemory
3 – Confidence judgments
Part II – Non-verbal studies in young children
1 – Studies inspired by the animal literature
2 – Error monitoring
Part III – A dual account of the development of metacognition
1 – Suggestive arguments
2 – Testing preverbal infants metacognitive abilities
3 – Brief summary of the two experimental studies
Chapter Four: Empirical contributions
Summary of the first study
Paper: Behavioural and neural indices of metacognition in infants
Summary of the second study
Paper: Infants ask for help when they know that they don’t know
Chapter Five: General discussion
Part I – Two developmental trajectories
1 – Build in implicit redescriptions
2 – Consciously accessing and sharing metacognitive representations
3 – Learning explicit metacognitive knowledge
Part II – Perspectives for future research
1 – Inter-individual variability and clinical populations
2 – Infants as active rather than passive learners
3 – Metacognition and Consciousness
Conclusion
References