LOCALIZATION OF A MOVING VIRTUAL SOUND SOURCE IN A VIRTUAL ROOM, THE EFFECT OF A DISTRACTING AUDITORY STIMULUS. Matti Gröhn - PDF

Description
Proceedings of the 22 International Conference on Auditory Display, Kyoto, Japan, July 2-5, 22 LOCALIZATION OF A MOVING VIRTUAL SOUND SOURCE IN A VIRTUAL ROOM, THE EFFECT OF A DISTRACTING AUDITORY STIMULUS

Please download to get full document.

View again

of 9
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Information
Category:

Government Documents

Publish on:

Views: 10 | Pages: 9

Extension: PDF | Download: 0

Share
Transcript
Proceedings of the 22 International Conference on Auditory Display, Kyoto, Japan, July 2-5, 22 LOCALIZATION OF A MOVING VIRTUAL SOUND SOURCE IN A VIRTUAL ROOM, THE EFFECT OF A DISTRACTING AUDITORY STIMULUS Matti Gröhn Telecommunications Software and Multimedia Laboratory Helsinki University of Technology, P.O.Box 54, FIN-215, HUT, Finland ABSTRACT An audio localization test of moving virtual sound sources was carried out in a spatially immersive virtual environment, using loudspeaker array with vector based amplitude panning for reproduction of sound sources. and elevation error in localization was measured. In this experiment the main emphasis was to explore the effect of a distracting auditory stimulus. Eight subjects accomplished a set of localization tasks. In these tasks they perceived the azimuth more accurately than the elevation. The distracting auditory stimulus decreased the localization accuracy. There was large variation between the subjects. The median error in azimuth for the most inaccurate subject was approximately twice as much as for the most accurate subject. The amount of the localization blur was dependent on angular distance from virtual sound source position to the nearest loudspeaker. The localization blur increased while the angular distance increased. Results of this experiment were compared with the results achieved in our previous experiment without the distracting stimulus. 1. INTRODUCTION Immersive virtual environments provide an integrated system of three dimensional (3D) auditory and visual display. Generally, models explored in virtual environments have parts of specific interest. So far the most common method to emphasized them has been to highlighted them visually. The interesting parts of the models can also be accentuated by using auditory beacons [1] or other auditory stimuli. In a dynamic representation it is important that the user is able to follow the location of the moving sound source. The purpose of this research is to find out, how much the additional stimuli will affect on the localization accuracy of a moving virtual sound source in a virtual room. We have previously accomplished an experiment on localization of a single moving virtual sound source [2]. In this article the results of a multi stimuli experiment are described and compared with localization results achieved in our previous experiment. Both of these experiments were accomplished without visual stimulus. Auditory localization of static 3D sound sources has been tested in several experiments previously. Most of these tests have used headphone reproduction [3, 4, 5, 6]. As a part of his localization experiment Sandvad [7] has measured the localization accuracy in a direct loudspeaker reproduction. The localization accuracy in panned (amplitude interpolated) loudspeaker reproduction has been reported for example by Pulkki [8, 9] Virtual room Localization experiments were accomplished in the virtual room 1 of the Helsinki University of Technology [1] (Figure 1). Typically there are simultaneously multiple users in a virtual room. In multiuser situation the loudspeakers are a more convenient sound reproduction method than headphones. Figure 1: The schematic drawing (courtesy of Seppo Äyräväinen) of the virtual room of the Helsinki University of Technology. Localization experiments were accomplished without visual stimulus. For the multichannel sound reproduction we use vector based amplitude panning (VBAP) [11]. It is less sensitive to listening position than Ambisonics [12], which is a benefit in a multiuser situation. In addition, to get an optimal result with Ambisonics the loudspeakers should be in a symmetric layout, which is hard to achieve in a virtual room, because the visual display system limits the possible loudspeaker locations. Due to the visual display, it is practically impossible to implement a wave field synthesis (WFS) [13] speaker array in a virtual room. In a virtual room the screen between loudspeaker and listener has an effect on perceived signal. In our virtual room it has been measured that high frequencies of the direct sound are attenuated more than 1 db. Impulse responses from each loudspeaker to the listening position were measured. Compensation filters were fitted to provide a sufficiently uniform timbre across the whole listening area. 1th order IIR filters are used for practical implementation of spectral compensation. More about the compensation of screen 1 ICAD2-1 Proceedings of the 22 International Conference on Auditory Display, Kyoto, Japan, July 2-5, 22 Loudspeaker Table 1: and elevation angles of loudspeakers presented from the listening position. damping and other implementation details of our audio environment are covered in another article [14]. Currently we are using 14 Genelec 129A loudspeakers for sound reproduction. Their setup is presented in Table Direction indication methods In localization experiments it is important to use a direction indication method to provide information about the perceived location. Researchers have used several different direction indication methods like graphical response screen [15], adjusting a reference sound [16, 9], pointing on a schematic drawing of the loudspeaker setup [17], using a head mounted laser pointer for pointing [18] or pointing with tracked toy gun [7]. Djelani et al [19] have compared three different direction indication methods: the Bochum-Sphere technique (also known as GELP) finger pointing and head pointing. In Bochum-Sphere technique the position of the auditory event is indicated on a sphere representing auditory space. According to their results finger pointing and head pointing were superior to the Bochum-Sphere technique. In the experiment described in this article, subjects pointed the perceived location of the sound source with tracked baton (figure 2). In our environment we have concentrated on loudspeaker reproduction. Our previous experiment [2] indicated, that in our virtual room a moving virtual sound source is localized as accurately as a static amplitude panned virtual sound source (not located in loudspeaker positions). For this article an experiment with moving sources with additional distracting sounds was accomplished Subjects In this experiment there were eight non-paid volunteers. Each of them reported to have normal hearing, although this was not verified with audiometric tests. There were six male subjects and two female subjects. Six of the subjects were the same as in our previous experiment Stimuli To utilize both main binaural cues (ITD and ILD), the sound signal should have enough energy at low frequencies (below 1.5 khz) and at high frequencies (above 1.5 khz). There are also other factors in stimulus affecting the localization accuracy like temporal structure (see for example [2, 21, 22]). It has been found [23] that frequencies near 6 khz are important for elevation perception. In the experiment there were three different stimuli: pink noise (one minute long sample), music (2 minutes 45 seconds long excerpt from The Wall by Pink Floyd) and frog croak (.5 second long). These signals have different kind of spectral content. Pink noise covers the whole audible frequency range, but temporal information is missing. Music is a broadband signal that also has a clear temporal structure. The croak sound has most of its energy below 2 khz. In the experiments the stimuli were played continuously in a loop. Figure 2: The devices used in our experiments are our wandlike device, and a tracked baton. 2. METHOD The task of the subjects was to point to the direction of the perceived location of the target sound source. The azimuth and elevation values for perceived location were recorded as well as the azimuth and elevation values for sound source location. During the experiment subjects did not get any feedback about their localizing accuracy. Figure 3: One of the subjects carrying out the experiment. ICAD2-2 Proceedings of the 22 International Conference on Auditory Display, Kyoto, Japan, July 2-5, Procedure During the experiment the subjects could freely move and turn they head. Each test task started with target sound in a static position. The subjects could freely locate the starting point of the target sound. The subjects pointed the perceived location of the starting point with a tracked baton (see Figure 3) and indicated that by clicking a wand button (Figure 2). After the subjects clicked the button of the wand, they heard a signal which indicated that sound source started to move. Simultaneously the additional distracting sound was started. The task of the subjects was to follow the movement of the target sound by pointing it with the baton. The end of the task was indicated with an end signal and silence. Then there was a short pause before the next task In this experiment the distracting stimulus was always the same as the target sound but with different timing and gain. The gain of the distracting sound was 1 db less than the gain of the target sound. In this experiment there were three different target trajectories, one static distracting sound position and two different distracting sound trajectories and three different stimuli. For each subject each possible combination was presented once, which equals 27 tasks per subject. The tasks were presented in randomized order to avoid learning. Each trajectory was 18 seconds long and the indicated sound source position was measured at a 2 Hz sampling rate. and elevation values of the perceived location were recorded. In addition, virtual sound source azimuth and elevation values, the time from start, stimulus index, trajectory index, and distracting sound source azimuth and elevation values were recorded. The recording started while the user clicked the button of the wand, and ended when sounds were muted. Most of the subjects reported that they pointed at the end position after the sound has already been muted. Unfortunately this information was not recorded. In the future, recording should be continued few seconds after the muting. Pink noise Pink Floyd Frog croaks All signals Table 2: Median values of absolute azimuth and elevation errors for starting points (the target sound was not moving and there was no distracting stimulus). Pink noise Pink Floyd Frog croaks All signals Table 3: Median values of absolute azimuth and elevation errors for dynamic trajectories for signals (moving sound with distracting stimulus). angle error Error angle Virtual Sound Source Position error Perceived position angle Figure 4: Positions are defined using azimuth and elevation angle from the listening position Results of the experiment Virtual sound source position and perceived position are defined using azimuth and elevation angles from the listening position. The azimuth and elevation error, are defined as angular difference between source position and perceived position as shown in figure 4. The error angle is the shortest angular distance between the sound source position and perceived position. The median values of absolute azimuth and elevation errors 2 in starting points (static sound) are shown in Table 2. The median azimuth localization error for starting points was 7.9 degrees and in elevation 15. degrees. These are in line with the overall median accuracy achieved in our previous static experiment [24]. The median values of absolute azimuth and elevations errors for moving sound trajectories are shown in Table 3. It is a median of all measurements. As expected the error increased due to movement of the sound source. The median azimuth error for trajectories was 17. degrees and median elevation error was 25.4 degrees. The error in elevation was larger than the error in azimuth. This is in agreement with our previous dynamic experiment [2] without distracting auditory stimulus. Minimum Maximum All subjects Table 4: Minimum and maximum median values of absolute azimuth and elevation error for subjects. The median absolute errors for each subject are presented in figure 5. There was a remarkable difference both in azimuth and elevation accuracy between the subjects. In table 4 the minimum and maximum of subjects median errors are represented. The maximum azimuth error was almost twice as large as the minimum. 2 absolute azimuth error = abs(perceived azimuth - source azimuth) ICAD2-3 Proceedings of the 22 International Conference on Auditory Display, Kyoto, Japan, July 2-5, 22 4 Median azimuth error Median elevation error Subject Error angle Figure 5: Median of absolute azimuth and elevation errors for all test subjects. Also the maximum elevation error was approximately twice as large as minimum. An interesting phenomenon is, that the most accurate subject in azimuth (subject number 5) is among the worst ones in elevation accuracy. On the other hand the most accurate subject in elevation (subject number 3) is among the worst ones in azimuth accuracy. For the figures 6, 7 and 8 the most representative examples were chosen. The difference between subjects is easily seen in figures 6 and 7. These figures are a combination of three different plots. On the left, there is an azimuth-elevation plot displaying the trajectories of target sound, distracting sound and measured pointing values for each signal. In the middle figure, the time dependent change in azimuth is presented, and on the right the time dependent change in elevation is presented. Subject number 4 (in Figure 6) perceived the change in elevation while subject number 5 (in Figure 7) failed to notice the change. On the other hand subject number 5 accurately followed the azimuth position of the target sound. In most of the cases the distracting stimulus only decreased the accuracy of localization. In ten percent of the cases there occurred a confusion, in which the subject pointed at the distracting stimulus instead of the target. An example of the situation where subject pointed consistently for approximately five seconds at the static distracting sound instead of the target sound is shown in figure 8. Although there were large differences between subjects, figures 6, 7 and 8 indicate that subjects were consistent in their perceptions. Due to the VBAP reproduction the angular distance between the target sound and the nearest loudspeaker had an influence on accuracy. The error angle (see figure 4) increased while the angular distance between the target sound and nearest loudspeaker increased. In figure 9, the x-axis presents the angular distance (in degrees), and y-axis presents the error angle. In this plot the distance values were stratified using one degree accuracy. For each group the median and standard deviation were computed. The median is represented with a line and the standard deviation with error bars. The correlation coefficient value for the medians is.91. The angular distance between the target sound and the dis Angular distance of virtual source from nearest loudspeaker Figure 9: Dependency between error angle and angular distance of target sound and nearest loudspeaker. Error angle Virtual source angular distance from distracting sound Figure 1: Dependency between error angle and angular distance of target sound and distracting sound. ICAD2-4 Proceedings of the 22 International Conference on Auditory Display, Kyoto, Japan, July 2-5, Figure 6: Trajectory plots for subject number 4 for one target sound and distracting stimulus combination. The measured locations for all three signals are presented using dashed lines. The position of the target sound source is indicated with thick line and the positions of the distracting sound is presented with thin line. In azimuth-elevation plot the locations of loudspeakers are indicated with star-sign. The starting point of the target sound is located in 45 degrees in azimuth and 35 degrees in elevation. The end point of the target sound is located at -45 degrees in azimuth and -35 degrees in elevation. The starting point of the distracting stimulus is located at -45 degrees in azimuth and 35 degrees in elevation. The end point of the distracting stimulus is located at 45 degrees in azimuth and -35 degrees in elevation Figure 7: Trajectory plots for subject number 5 for the same target sound and distracting stimulus combination as in figure Figure 8: Trajectory plots for subject number 2 for another target sound and distracting stimulus combination. The starting point of target sound is located at -9 degrees in azimuth and degrees in elevation. The end point of the the target sound is located at 9 degrees in azimuth and degrees in elevation. The distracting stimulus is statically located at degrees in azimuth and 1 degrees in elevation (black dot in azimuth-elevation plot). tracting sound also had influence on accuracy, though the effect was smaller and the correlation not as strong (correlation coefficient for the medians -.71). In figure 1, the x-axis presents the angular distance and the y-axis presents the error angle. The error angle decreased while the angular distance increased. ICAD2-5 Proceedings of the 22 International Conference on Auditory Display, Kyoto, Japan, July 2-5, Comparison of results with previous experiment As expected, the error levels in this experiment at starting points (Table 2) were in line with the error levels in our previous experiment. For all signals the absolute median azimuth error in our previous experiment 6.8 degrees, and in this 7.9 degrees. In elevation the error in our previous experiment was 15.3 degrees and in this 15.. At the starting point the distracting stimulus was not yet presented to the subjects. In the moving sound trajectories the distracting sound had influence on measured azimuth accuracy. The median azimuth error increased almost five degrees. Without distracting stimulus the median error was 12.5 degrees (Table 5) and with distracting stimulus 17. degrees (Table 3). In elevation the difference was not remarkable (24.1 degrees without distracting stimulus and 25.4 with distracting stimulus). Pink noise Pink Floyd Frog croaks All signals Table 5: Median values of absolute azimuth and elevation errors in trajectories in our previous experiment without distracting stimulus The decreased accuracy is seen in figure 11. On the left are the measurements without distracting stimulus and on the right with distracting stimulus. Especially the perception of the elevation has degenerated. In the azimuth-elevation plot without the distracting stimulus, the shape of the triangle is roughly recognizable in dots although the measured trajectories are bent towards the loudspeaker positions. On the right, the shape of the triangle is more blurred in the dot cloud. 3. DISCUSSION According to Blauert [6] the localization blur in azimuth is approximately one degree in optimal conditions. The localization blur for elevation is more signal dependent and it can variate from four degrees (white noise) to seventeen degrees (continuous speech by unfamiliar person). Familiarity with the signal also plays a role in elevation perception. With interfering noise the localization blur is dependent on signal levels and frequencies[6]. If the level of the target signal is about 1-15 db above the interfering noise, localization blur is in the same level as it is without the interfering noise. On the other hand, in an article by Tuyen and Letowski [25] it has been mentioned that a 6 db signal-to-noise ratio is appropriate for tasks requiring accurate frontal localization. In the experiment described in this article the distracting stimulus increased the localization blur, although the gain difference between the target sound and distracting sound was 1 db. In our environment there was more localization blur than in optimal conditions. That is natural because there are several factors degrading the localization in a virtual room as listed in Table 6. On the other hand, localization blur in direct loudspeaker reproduction [
Related Search
Similar documents
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks