evaluation of the influence of environment-related cues

Get Complete Project Material File(s) Now! »

Audio-Only Augmented Reality Definition

The general context of AR encloses technologies linked to all sensory modalities and is not limited to a single one. AAR is considered as an outgrowth of the AR concept, it restrains the modality of the virtual content to audition. It does not imply that the user only has access to audition but that virtual events should only be auditory while the user has complete sensory access to the real-world [71]. A fourth property that can be added to the definition of AR introduced by Azuma to explain the concept of AAR is that all virtual content displayed to the user should be auditory.
The use of headphones or loudspeakers for AAR can be restricted to self-explanatory mono (0-dimensional), stereo (1-dimensional), or surround (2-dimensional). How-ever, most applications developed nowadays focus on 3D auditory display on headphones. This chapter mainly focuses on the technological aspects and ap-plications induced by this type of auditory display.

some aar applications

The emergence of affordable consumer technologies for 3D audio listening, as listed earlier, has facilitated the delivery of AAR applications. Most of the appli-cations were thought of as mobile (MAAR) or wearable audio augmented reality (WAAR) applications. Indeed, in mobility contexts, audio is an appealing alter-native to vision as a display modality, avoiding the physically and cognitively de-manding interaction with graphical user interfaces when on the go [60, 75]. In this section, we present a review of different AAR application scenarios. The existing applications can be categorized as: human-to-human interactions (Section 2.2.1) or location-based applications (Section 2.2.2).

Human-to-human interactions

binaural telephony A normal communication through mobile phones trans-mits through a mono signal with a limited sound bandwidth of 300Hz to 3400Hz, even with the use of headphones. Binaural telephony means that the signal trans-mission is done through a Head-Related Transfer Function (HRTF) (See Section 2.3.2.1). This type of signal is incompatible with normal telephone lines or GSM networks, and thus must be done through the Voice over IP standard. A standard of com-munication that allows no limit to the use of frequency bandwidth. In a telephony scenario, the use of binaural signals can allow one interlocutor to be spatially lo-cated from the perspective of the other. The spatialization of the interlocutor’s voice in front of the other creates a more natural feeling of the conversation [103].
audio meetings This principle can be extended to audio meetings where mul-tiple users are present. In recent years, audio meetings have become increasingly popular. Since the COVID-19 Pandeminc, face-to-face meetings have become in-creasingly difficult.Audio meetings are traditionally conducted via telephones and speakerphones. One of the issues is the lack of telepresence. Each voice of the participants in the meeting is displayed on a single mono channel. The use of spatialization of the different interlocutors can take virtual meetings to a new ex-tent. Remote interlocutors can be panned around the user (see Figure 2.2) and blended into the user’s acoustical environment. The key benefit is that it enhances the overall comprehension of the conversation [103].
Among the different possible use cases of this type of AAR, one instance is when a traditional meeting is scheduled and one team member is unable to attend be-cause he or she is out of town. This member can virtually participate in the meeting if he or she has an AAR device and at least one speaker is present at the meeting (see Figure 2.2). One downside being the need of a speaker system at the other end.

Generating a spatialized virtual sound source

An augmented reality system should be able to merge real and virtual sounds so that the virtual sounds seem embedded in the real environment. Harma [71] even suggests that ideally an augmented reality audio system should withstand a test close to the Turing test for artificial intelligence [163]. If a listener can’t tell whether a sound source belongs to the real or the virtual audio environment, the software generates a subjectively flawless augmentation of the listener’s auditory environment.
This requirement demands the application of appropriate spatialization process-ing of virtual auditory events in order to meet this criterion, so the virtual source seems to be emitted in the real acoustic environment. Moreover, the room effect is well known to contribute significantly to the perceived location of sound events [20, 90]. As a result, the room effect processing must be carefully designed to en-sure that the perceived location of a virtual event corresponds to the intended position. In AAR applications, the idea is that the virtual event seems to originate from a precise location in the environment, which could be, for example, a spe-cific real-world object or position, or perceived behind, next to, or in front of other real-world sound sources. The challenge is two-fold: a) choosing an appropriate spatialization model to control the placement of virtual auditory events and the related room effect in order to meet room-related perceptual criteria, b) obtaining a priori knowledge about the acoustical or architectural features of the real environment to tune the model appropriately; and c) being able to run in real time, since it should be applied in an AAR scenario. The spatialization model chosen has a direct impact on the replication of auditory cues transmitted by the room effect. It will have an impact on the spatial perception of a sound source and, more broadly, on the perceptual representation of the overall virtual sound scene.
This current section focuses on the different methods used for the spatial ren-dering of virtual events in front of the listener and how to perform a binaural rendering for a headphone display.

Auditory distance perception accuracy: An inherent compression effect

Perceived auditory distance is inherently compressed. Listeners tend to overesti-mate the distance of far sources and underestimate the distance of close sources [90]. What is meant by « far » and « close » is related to the concept of a « crossover point » [7]. It is the distance for which there is no bias in perceived distance; its value is considered to be around 1 meter but varies depending on acoustic envi-ronment characteristics. Contrarily to azimuth and elevation, which present lim-ited absolute errors of localization [22], the error in auditory distance perception is virtually infinite for increasing distances. As a result, it is regarded as the most imprecise dimension in the perception of sound localisation.
In order to characterize auditory distance perception, Zahorik proposed a model [183] where the perceived distance D is related to the actual distance d through a compressive power function, which is a suitable approximation to the majority of psychophysical distance functions: D = k Da (1)
Where k and a are the fitting parameters of the function. They are respectively called the linear compression (when k > 1) or expansion (k < 1) coefficients, and the non-linear compression (when a < 1) or expansion (a > 1) coefficients. They are equivalent to the slope and intercept when represented on a logarithmic scale (see Figure 3.1).
This two-variable function offers a comprehensive representation of the com-pression effect on a set of reported distances and is used thoroughly in this thesis work. In [183], 84 data sets were fitted with the above compressive power function. A mean value of 1.32 was found for k, while a mean value of 0.54 was found for a. This result illustrates the systematic compression effect observed on auditory distance reports.

Dynamic situations

When the sound source and/or the listener are moving, additional dynamic cues contribute to the perceived source distance [90, 183]: the time-to-contact or acous-tic tau, the absolute motion parallax and Doppler effect. They are mainly related to changes of intensity, binaural and spectral cues, respectively.
The acoustic tau refers to the sound level variation occurring when the distance to the source changes [11]. It may be exploited by the perceptual system either for distance evaluation or for time-to-contact estimation when the source is looming or when the listener moves towards the source. Ashmead et al. [11] established that participants could benefit from increased auditory distance perception accu-racy when they were able to move towards the source compared to situations where they stood still. The absolute motion parallax refers to the case where the source and the listener are not moving exactly towards each other. In this case, the change in angular di-rection of the source creates dynamic changes in binaural information that can contribute to the distance estimation. Speigle and Loomis [155] notably illustrated the benefit of this effect in a situation where the sound source was displayed out-side of the median plane of participants. Moving participants exhibited increased accuracy in judging the distance to a sound source compared to static situations.
A particular dynamic situation must be mentioned here. When a sound source moves towards a static listener (looming source), a systematic asymmetry in dis-tance judgements is observed [70]. Indeed, the perceptual system tends to over-estimate the change in intensity of looming sounds when compared to receding sound sources. This results in a systematic bias of underestimation of looming sound sources. This bias might be triggered by the perceived biological impor-tance of looming sounds that could be potentially interpreted as a threat or an incoming collision [34]. Gardner [62] noticed that head movements could be very slightly helpful for auditory distance perception of speech signals in anechoic conditions. The main benefit that these movements could provide is the ability for the listener to hear the source laterally, which could enable the use of binaural cues for evaluating the distance of nearby sources [79].

READ  SILICON HETEROJUNCTION SOLAR CELLS

Prior knowledge and expectation

Without any prior knowledge of the sound source, most of the above described acoustic cues provide only relative distance judgements. In contrast, sounds fa-miliar to the listener may enable the interpretation of acoustic cues as absolute distance judgements. Certain sounds, such as speech, are instantly recognizable to all listeners, even more for languages with a prosody similar to their native language(s) [30]. Vocal signals also present particular characteristics that link their production mode (e.g. whispering to shouting) to an expected sound source power. Gardner [62] demonstrated that the estimated distance of a source playing back whispered speech is underestimated as a result of a low expected sound power. Conversely, the estimated distance to a source playing back shouted speech is over-estimated due to its high expected sound power. Similar effects may be elicited by musical instruments or motor sounds.

Learning

Learning results from the repeated exposure to similar listening settings. Cole-man [40] illustrated this phenomenon in a distance reporting experiment where participants were asked to assess the distance between an unfamiliar stimulus played back on loudspeakers distributed at various distances. Initially, listeners were unable to determine which loudspeaker was displaying the stimulus. As the session progressed, performances improved incrementally without any feedback from the researcher. Makous and Middlebrouks [106] and Kopˇco et al. [92] also observed such a learning effect in similar experiments. Carlile [36] emphasized the importance of training sessions that enable participants to become acquainted with unfamiliar auditory environments and stimuli, as well as accustomed to the distance reporting method.

Table of contents :

1 introduction 
1.1 General Context and motivations
1.2 Objectives of the thesis
1.3 Key aspects of the framework
1.4 Contributions
1.5 Thesis structure
2 audio-only augmented reality 
2.1 Introduction
2.1.1 Augmented Reality Definition
2.1.2 Audio-Only Augmented Reality Definition
2.2 Some AAR applications
2.2.1 Human-to-human interactions
2.2.2 Location-based applications
2.3 Technological challenges of AAR
2.3.1 Treatment of real sounds
2.3.2 Generating a spatialized virtual sound source
2.3.3 Motion tracking
2.4 Summary and technical choices
3 auditory distance perception 
3.1 Auditory distance estimation
3.1.1 Auditory distance perception accuracy: An inherent compression effect
3.1.2 Auditory distance perception variability
3.2 Auditory distance perception cues
3.2.1 Acoustic cues
3.2.2 Non-acoustic cues
3.3 Relationship with externalization
3.4 Summary & perspectives on the thesis framework
4 visual contribution to auditory distance perception 
4.1 The superior spatial resolution of vision
4.1.1 General mechanisms of visual distance perception
4.1.2 Visual distance estimates
4.2 Audio-visual integration
4.2.1 Ventriloquist effect
4.2.2 Environment-related visual cues
4.3 Summary & perspectives on the thesis framework
5 binaural rendering approach of virtual sound sources 
5.1 Spatial Room Impulse Responses
5.1.1 Measurement Procedure
5.1.2 Used tools
5.2 Converting Directional Room Impulse Responses to Binaural Room Impulse Responses
5.2.1 Encoding into Higher Order Ambisonics (HOA)
5.2.2 Decoding HOA to the binaural format
5.3 Specific Treatments
5.3.1 Denoising Spatial Room Impulse Responses
5.3.2 Diffuse field equalization
5.4 Measurements usage in the experiments
6 experimental procedure 
6.1 Distance report methods
6.1.1 Verbal report
6.1.2 Direct-location
6.1.3 Selected method: the Visual Analogue Scale (VAS)
6.2 Online Experiment methodology
6.2.1 Technical aspects of online experiments
6.2.2 Experiment Builder: PsychoPy
6.2.3 Hosting platform: Pavlovia
6.2.4 Recruiting participants: Prolific
6.2.5 Data quality concerns
7 evaluations of the importance of intensity and reverberation
7.1 Introduction
7.2 Experiment I: Development of distance rendering models
7.2.1 Reference measurements
7.2.2 Envelope-based model
7.2.3 Intensity-based model
7.2.4 Objective comparisons
7.3 Experiment I: Perceptual performances of the models in a congruent situation
7.3.1 Material & Methods
7.3.2 Procedure
7.4 Experiment I: Results
7.4.1 General results
7.4.2 Individual results
7.5 Experiment I: Discussion
7.5.1 Envelope-based model performances
7.5.2 Intensity-based model performances
7.5.3 Acoustic cues weighting strategies
7.5.4 Influence of the experimental context and comparison with past studies
7.6 Experiment II: Evaluating the relevance of the early-to-late energy ratio
7.6.1 BRIRs synthesis method
7.6.2 Material & Methods
7.6.3 Procedure
7.7 Experiment II: Results
7.7.1 Classroom
7.7.2 Gallery
7.8 Discussion
7.8.1 Backward stimuli
7.8.2 Forward stimuli
7.8.3 Spectral aspects
7.8.4 Spatial aspects
7.8.5 Reverberation-related cues weighting strategies
7.9 Conclusion
8 evaluation of the influence of environment-related cues 
8.1 Introduction
8.2 Experiment III: Evaluating the influence of incongruent visual cues .
8.2.1 Objective of the experiment
8.2.2 Material & Methods
8.2.3 Procedure
8.3 Experiment III: Results
8.3.1 General Results
8.3.2 Effect of room volume
8.3.3 Compression effect quantification across rendering methods
8.3.4 Influence of the visual spatial boudary on compression coefficients
8.3.5 Influence of the room volume on compression coefficients .
8.4 Discussion
8.4.1 The influence of the visual spatial boundary
8.4.2 The influence of volume on acoustic cues weighting strategies
8.4.3 Experiment limitations
8.5 Comparison with Experiment I
8.5.1 Envelope-based performances
8.5.2 Acoustic cues weighting strategies
8.6 Conclusion
9 impact of the acoustic divergence between reproduced room effects 
9.1 Introduction
9.2 Experiment IV: An acoustically divergent scenario
9.2.1 Objectives of the experiment
9.2.2 Material & Methods
9.2.3 Procedure
9.3 Experiment IV: Results
9.3.1 Effect of anchor condition
9.3.2 Compression effect quantification
9.4 Discussion
9.4.1 Effect of uncorrected room divergence effect on auditory distance perception
9.4.2 Correcting the divergence with loudness matching
9.5 Comparison with Experiment III
9.5.1 Acoustic and visual divergence
9.5.2 Impact of anchor stimuli in the control condition
9.6 Conclusion
10 general conclusion & perspectives 
10.1 Experimental procedures
10.2 The perception of early energy relatively to reverberation for distance
10.3 Acoustic cues weighting strategies and the influence of room volume
10.4 Visual incongruence and acoustic divergence
a appendix, preliminary experiment 
a.1 Methods
a.1.1 Auditory stimuli
a.1.2 Participants
a.1.3 Procedure, listening environment & report method
a.2 Results
a.3 Conclusion
b appendix, chapter 7 (experiment ii) 
c publications 
bibliography

GET THE COMPLETE PROJECT

Related Posts