notion of attention being a general, modality-independent cognitive resource serving beneficial purposes for other modalities. One family of crossmodal effects are primary visual load effects on secondary auditor processing. Most studies focused on the auditory change effects . Results show inconsistencies regard- ing the directionality of crossmodal effects. Some report decreased MMN amplitudes in the secondary task [1,10], others find the opposite [17,18] and there also exist null-findings on potential crossmodal influences [7,8,9]). While the MMN reflects active sensory memory processing , the N1 as its prerequisite contributes to encoding the sensory memory trace. It acts out stimulus perception as well as feature-detection mechanisms and represents fundamental auditoryprocessing . However, it was usually not distinguished whether standard or deviant processing was affected and which of the two was responsible for the decrease in auditory change detection . It remained open whether the observable effects would already be present during basic tone processing. SanMiguel and colleagues  made an exception to this reporting an effect of visual working memory (WM) on the auditory N1. However, memory load was manipulated only on one level and the directionality of this effect (decreasing/ increasing) could not be determined. Haroush and colleagues  also reported auditory evoked responses (AEPs), however, they also focused on the MMN and the significance of the effect specifically of N1 or P2 amplitudes could not be evaluated.
76 Chapter 3: Spectral and temporal integration in modulation detection integration and the resolution time constants is not a real problem (Viemeister and Wakeeld, 1991). They pointed out that the observation of a 3-dB decrease in threshold for each doubling of duration - as seen in typical test-tone integration data - means that the auditory system behaves as if perfect power integration occurs but that the system is not necessarily performing the operation of mathe- matical integration. Therefore it might be important to distinguish between the phenomenon of temporal integration and the process that accounts for the phe- nomenon. Viemeister and Wakeeld (1991) provided evidence that integration with a long time constant, such as proposed by the classical models, does not occur in auditoryprocessing. They showed that the threshold for a pair of short pulses yields classic power integration only for pulse separations of less than 5- 10 ms. For longer separations, the thresholds did not change with separation and the pulses appeared to be processed independently (cf. Zwislocki et al., 1962). In a second experiment, Viemeister and Wakeeld (1991) showed that the threshold for a pair of tone pulses was lower than for a single pulse, indicating some type of integration, but was not aected by level changes of the noise which was presented between the two pulses. The experimental results from that study are plotted in the left panel of Fig. 3.17. It shows the average thresholds for the rst pulse alone (squares), the second pulse alone (circles), and for the pulse pair (triangles) as a function of the relative level of the intervening noise. The thresholds for the rst pulse alone do not depend on the noise level. There is a slight increase in threshold for the second pulse re
ecting forward masking. The thresholds for the pulse pair are about 2.5 dB lower than those for either pulse alone and do not depend on the level of the intervening noise (for details see Viemeister and Wakeeld, 1991). These data cannot be explained by long-term integration.
VNS paired with tones has also been examined in a rat model of tinnitus in order to improve the degraded auditoryprocessing observed in this model (10). These rats were exposed to 16 kHz noise at 115 dB, and one month after noise exposure, the Turner gap detection method was used to identify the perceived tinnitus frequency of each rat. VNS was paired with multiple tone frequencies that were distinct from the perceived tinnitus frequency, 300 times per day for 20 days. Both the behavioral and neural effects of noise exposure were reversed following one month of VNS-tone pairing. This VNS- tone therapy, which was first tested in a rat model of tinnitus, has recently been shown to provide long-lasting improvement in tinnitus intensity and tinnitus distress in chronic tinnitus patients (22– 24).
Similarly, individuals with Rett syndrome have impaired receptive language and significantly degraded cortical responses to sound (27–30). Rodents with heterozygous Mecp2 mutation, which models Rett syndrome, also exhibit alterations in the auditory cortex response to sounds (23,24). These rats accurately discriminate speech sounds in a quiet background, but have impaired discrimination accuracy when the speech sounds are presented in the presence of varying levels of background noise. These rodent models of the auditoryprocessing deficits observed in autism spectrum disorders could easily be used to test the effectiveness of potential adjunctive intervention therapies.
In this thesis, the applicability of the auditoryprocessing model of Dau et al. ( 1997a ) for the purpose of audio quality prediction was shown. In Chapter 2, a novel method for the objective, perceptual assessment of quality differences between audio signals was introduced. It represents an expansion of the speech quality measure q C of Hansen and Kollmeier ( 2000 ), who successfully applied their method to predict the transmission qual- ity of low bit rate telephone speech codecs. The basic approach was adopted in the present work: The auditory model of Dau et al. ( 1996a , 1997a , respectively) is employed to trans- form a pair of reference and test signals into corresponding internal representations. The linear cross-correlation coefficient of these internal representations serves as a measure for the perceptual similarity between test and reference signals, which is interpreted as the per- ceived audio quality of the test signal relative to the quality of reference signal. However, the extension of the method from narrow-band speech to any kind of broad-band audio signal and from clearly audible to just perceptible distortions required some methodical modifications and expansions: The bandwidth of the peripheral filterbank was extended and the modulation lowpass filter was replaced by a modulation filterbank (cf. Dau et al. , 1997a ). Apart from these modifications concerning the modeling of the auditory signal processing, the original method was expanded by further stages that model more cognitive aspects of audio quality perception. In fact, the ”band importance weights” applied in the speech quality measure q C likely model a cognitive aspect as well. However, the necessity of a non-uniform frequency weighting could not be confirmed by the results of the present work. Instead, it was found that a sign-dependent weighting of differences between inter- nal representations of test and reference signals somewhat improves the accuracy of the quality prediction. The most substantial expansion of the method, however, is represented by the modeling of the relation between instantaneous and overall audio quality. This was realized by computing a sequence of short-time cross correlation coefficients, weighting this sequence by the moving average of the internal representation of the test signal and finally calculating the 5%-quantile of the weighted sequence. Without accounting for this
Given that temporal envelope cues are based on rapid monaural level fluctuations, it is plausible that monaural adaptation mechanisms, which alter the internal representation of the signal’s envelope, play a role in the processing of envelope ITD if they take place prior to binaural processing. This was confirmed by Hafter et al. (1988), who found a monau- ral adaptation mechanism “at a location peripheral to binaural interaction” (p. 663) that affects binaural thresholds. Several parts of the auditory system exhibit adaptive behavior, with some adaptive behavior likely to occur prior to binaural processing. For example, an auditory nerve fiber can have a maximal discharge rate at the onset of a stimulus response and a gradual decrease with the ongoing stimulus: Right after the onset of a stimulus, sen- sitivity is reduced. This behavior was used by Smith (1979) to explain forward masking. The gain and loss of sensitivity with onset and ongoing excitation are parameters that differ from cell to cell and multiple cells with different types of firing patterns have been found in the auditory system (e.g., Young, 1988). By investigating post-stimulus time histograms of SAM and transposed tones, physiological studies (e.g., Griffin et al., 2005; Dreyer and Delgutte, 2006) have found that neural responses for transposed tones are more synchro- nized to the stimulus envelope than for SAM tones. Bernstein and Trahiotis (2009) have also related the lower psychoacoustic JNDs achieved with transposed tones to the higher neural synchronization. Neuronal adaptation is well established in monaural auditory mod- els (Dau et al., 1996a, 1997; Meddis and O’Mard, 2005). Most binaural models, however, do not include any form of adaptation prior to binaural interaction (Jeffress, 1948; Sayers and Cherry, 1957; Durlach, 1963; Colburn, 1977; Lindemann, 1986). An exception is the binaural processing model of Breebaart et al. (2001a,b,c) . It combines the monaural model of Dau et al. (1996a), including adaptation, with subsequent binaural processing. However, this model is not able to predict lateralization measurements in its present form. Other mod- els which can predict lateralization, such as the normalized 4 th-moment model (Dye et al., 1994), the normalized cross correlation coefficient model (Bernstein and Trahiotis, 2002), the position-variable model (Stern and Shear, 1996) and the two-channel interaural phase difference (IPD) model (Dietz et al., 2009) do not include neuronal adaptation.
In the debate around the VWFA it has also been questioned whether left vOT is restricted to visual (word) processing as assumed by the original VWFA hypothesis ( Dehaene et al., 2002 ). Instead, it has been ar- gued, the region might have a polymodal function ( Price and Devlin, 2003 ). Interestingly, a similar interpretation has recently been present- ed by Dehaene and Cohen (2011) who have suggested that the VWFA might be a “meta-modal” reading area. This was based on studies showing the left vOT activation in congenitally blind individuals during reading of Braille (e.g., Büchel et al., 1998; Reich et al., 2011 ). However, in a recent fMRI study our research group has argued that it might be premature to generalize these ﬁndings from congenitally blind to sighted individuals ( Ludersdorfer et al., 2013 ). We showed that the left vOT region which exhibited an orthographic familiarity effect (i.e., higher activation for visual pseudowords relative to words) was also activated for unfamiliar visual stimuli (i.e., false-fonts) and strongly deactivated for unfamiliar auditory stimuli (i.e., reversed spoken words) relative to rest. According to the phenomenon of cross-modal suppression ( Laurienti et al., 2002 ) such deactivations during auditoryprocessing should occur in brain regions dedicated to visual processes. Therefore, the ﬁndings of Ludersdorfer et al. were taken to speak for a visual rather than meta-modal role of the left vOT.
Speech perception requires the recognition and discrimination of phonemes, in particular the encoding of temporal information in short linguistic elements such as consonants and vowels. A main feature to categorize stop-consonants is the voice onset time (VOT), which is defined as the duration of the delay between release of closure and start of voicing. It characterizes voicing differences in a variety of languages and distinguishes voiced stop consonants (/b/, /d/, /g/) from their voiceless counterparts (/p/, /t/, /k/) (Lisker & Abramson, 1964). Discriminating voiced and unvoiced syllables in a consonant-vowel (CV)-VOT continuum is categorical by exhibiting two qualitatively discrete percepts. The neural activity of the auditory cortices during the processing of different VOT's in speech stimuli is reflected by the P50-N1 complex of the AEP (Sharma & Dorman, 1999; Sandmann et al., 2007; Zaehle et al., 2007; King et al., 2008; Toscano et al., 2010). Accordingly, the P50-N1 complex has been successfully shown to reflect neural representation of feature processing of the acoustic stimulus (Sharma et al., 2000; Elangovan & Stuart, 2011). Speech related disorders have been associated with altered acoustic processing abilities. Children with general language-learning disabilities (Tallal & Piercy, 1973; Tallal & Stark, 1981) and children and adults with dyslexia (Tallal, 1980; Ben- Yehudah et al., 2004) show an impaired auditoryprocessing of temporal information during speech perception. Specifically, these patients demonstrated deficient phoneme perception abilities, reflected by inconsistent labeling of CV-syllables in a VOT continuum (Joanisse et al., 2000; Breier et al., 2001; Chiappe et al., 2001; Bogliotti et al., 2008). As a completion to conventional approaches that treat temporal processing deficits in dyslexics by perceptual training (Tallal et al., 1996; Fricke et al., 2013; Chobert et al., 2014; Duff et al., 2014), tDCS might be a promising therapeutic tool.
Humans are remarkably sensitive to brief interruptions of ongoing sound. Gap-detection thresholds are typ- ically less than 6 ms in normal young adults, and often higher in older adults, patients with developmental disorders, or subjects with auditoryprocessing difficulties. Gap-in-noise detection thresholds are therefore rou- tinely measured in audiological clinics to assess auditory temporal acuity. However, despite the simplicity of the gap-in-noise detection task and its importance as a clinical tool, the neural mechanisms of gap detection are still poorly understood. Here I describe recent insights into the neural mechanisms of gap detection gained from studies of an unusual mouse model of gap-detection deficits. Neurophysiological data and computational modelling reveal that central auditory responses to sound offsets (disappearances) play a key role in defining the limits of gap-in-noise acuity. Additionally, adaptive gain control in higher auditory brain areas increases gap-in- noise sensitivity. These results indicate that gap-in-noise detection relies not only on peripheral and brainstem mechanisms that produce precisely timed neural responses to sound offsets and onsets, but also on higher cen- tral auditory mechanisms of adaptation and intensity gain control. Thus, elevated gap-detection thresholds in patients with auditory perceptual difficulties could arise from abnormalities in many different auditory brain areas.
Hearing is an important part of the human sensory system which provides us with information about our day-life surroundings and plays a fundamental role in the commu- nication between human beings. The auditory system, like all sensory systems, can be thought of as a chain of processing stages that is meant to transform a physical event into an internal (neural) representation carrying a variety of important information further ex- ploited by cognitive processes. Psychoacoustics tries to establish a functional relationship between the physical properties of a sound incident to the ears and the auditory sensa- tion related to the sound. Models of auditoryprocessing are an extremely helpful tool, offering meaningful interpretations of behaviourally measured data on the one hand, and helping to ask important new questions for psychoacoustic experiments on the other hand. Most natural sounds in our surroundings, including running speech, have a non-stationary characteristics. They show typical temporal fluctuations in their envelope. Especially for sounds which convey information, such as speech and music, much of the informa- tion appears to be carried by the changes of the temporal envelope, rather than by the stationary parts (Plomp, 1988; Drullman et al., 1994; Greenberg and Arai, 1998). This seems plausible since even in typical environments (such as rooms) sounds can undergo strong distortions that maintain the basic temporal structure while other aspects (such as the spectral content) might be strongly altered without changing the perceived informa- tion contents of the sound. With increasing fluctuation rates, the perception of envelope fluctuations ranges from a variation in the loudness to a temporal structure or rhythm, followed by the impression of roughness and pitch.
The questions that are tried to be resolved by this work are: (a) Do adolescents and young adults show deficits in temporal auditoryprocessing as claimed by Tallal (1980) and colleagues, also when task and stimuli complexity are held constant? (b) Which brain regions are involved in temporal auditoryprocessing and is that accompanied with lateralization effects? (c) Can differing hemodynamic brain activations be observed in the respective brain regions in case of temporal auditoryprocessing deficits in dyslexia and how is this difference characterized? The hypotheses with respect to the raised questions are: If dyslexics have problems with rapid temporal processing one would expect, the dyslexic participants of this study to perform inferior compared to control subjects on the temporal conditions. Depending on the degree of impairment maybe even the performance on the phonological condition might be degraded. Furthermore, a negative correlation between discrimination performance and degree of impairment is expected, with highly affected dyslexics performing worse on the discrimination task. On the neural level one would hypothesize, in accordance with previous studies, left hemispheric processing for rapid temporal stimuli in healthy control subjects. In the dyslexic group such lateralization effects might be lacking. In general, the task specific hemodynamic activation of dyslexics is expected to be decreased compared to controls. Furthermore, one would hypothesize a positive correlation between discrimination performance and brain activation as well as a negative correlation between degree of impairment and hemodynamic activation.
FADE was used to simulate basic, psycho-acoustical experiments and more complex Matrix sentence recognition tasks with a range of fea- ture sets (front-ends). On the side of the psycho-acoustical experiments, simultaneous masking thresholds depending on tone duration were in- cluded as well as spectral masking thresholds depending on the tone center frequency. On the side of Matrix sentence recognition tests, speech reception thresholds (SRTs) of the German Matrix sentence test were included in a stationary and a fluctuating noise condition. As signal representations, logarithmically scaled Mel-spectrograms (LogMS), stan- dard ASR features, auditory-inspired ASR features, and the output of a traditional “effective” auditoryprocessing model were employed. Mel frequency cepstral coefficient (MFCC) features were used as standard ASR features. The recently proposed Gabor filter bank (GBFB) and separable Gabor filter bank (SGBFB) features, which were shown to improve the robustness of standard MFCC-based ASR systems Schädler et al. (2012a); Schädler and Kollmeier (2015a), encode spectro-temporal modulation patterns of audio signals and were used as auditory-inspired ASR features. The LogMS was also considered as a signal representation because it represents the common basis for MFCC, GBFB, and SGBFB features. The signal representation of the perception model (PEMO) from Dau et al. (1997), referred to as PEMO features, represented the output of a traditional auditory signal processing model. ASR features are usually used with feature vector normalization, such as mean and variance normalization (MVN) (Viikki and Laurila, 1998), while signal representations in auditory models are not. To assess the effect of MVN, LogMS, MFCC, and PEMO features were employed with and without MVN. All considered experiments were simulated using all feature sets and the obtained thresholds were compared to empirical and model data from the literature.
Understanding what is said and recognising the identity of the talker are two important tasks that the brain is faced with in human communication. For a long time neuroscientific models for speech and voice processing have focused mostly on auditory language and voice-sensitive cerebral cortex regions to explain speech and voice recognition. However, our research has shown that the brain uses even more complex processing strategies for recognising auditory communication signals, such as the recruitment of dedicated visual face areas for auditoryprocessing. In my talk I will give a brief introduction to this work and show how the multisensory influences on auditoryprocessing can be harnessed to improve auditory learning.
depends on how AIT is measured: Raz et al. (1987) found correlations higher than the usually -.30 relationships (see Hunt, 1980) ranging between -.42 and -.54 between pitch discrimination and performance on Cattell’s Culture Fair Intelligence Test (Cattell & Cattell, 1973). They argue that high intelligence is associated with a greater resolution of sensory information. Raz et al. (1987) and Deary (1994) agree with the perspective of Spearman (1904), who claimed that auditory abilities (in particular the detection of thresholds for sound frequencies) constitute the basic processes of intelligence. Within the AIT types, empirical findings support stronger and more consistent relationships of AcI with loudness discrimination compared to pitch discrimination. On the contrary, AIT-P tasks correlate more highly with general musical ability (i.e., more complex musical tasks regarding the results mentioned above). Problems with pitch discrimination arise with participants having absolute pitch ability. Helmbold and Rammsayer (2006, see also Rammsayer & Brandler, 2002) examined the relationship of psychophysical temporal tasks and AcI. They applied auditory performance measures of interval timing, rhythm perception, and bimodal temporal-order judgment, e.g., in rhythm perception subjects had to indicate whether the presented rhythm was perceived as “regular” (beat-to-beat interval appeared to be of the same duration) or “irregular” (deviant beat-to-beat interval). They found the auditory timing tasks to be positively related to psychometric intelligence (figural reasoning: r=.47, Wiener Matrizen-Test, WMT; Formann & Piswanger, 1979; numerical speed test [Zahlen-Verbindungs-Test, ZVT; Oswald & Roth, 1987]: r=.36 ). An additional study (Rammsayer & Brandler, 2002) revealed that high IQ individuals are better in duration discrimination of auditory intervals, in temporal order judgments and temporal resolving power for central sensory information. A study of Deary et al. (1989) revealed a higher AIT-IQ correlation in verbal than in nonverbal IQ tests. That points into the direction of a common underlying mechanism between verbal and auditoryprocessing. According to Deary et al. (1989), this result mirrors verbal ability operating as a cumulative average of past levels of processing efficiency and explaining less idiosyncratic variance than a more fluid task. Verbal ability scores allow more resources to be freed for consolidation of verbal information when information intake is faster and discrimination more accurate.
The neural substrates of SSS, dissociated from localization, remain unknown. While both primary and posterior mammalian auditory cortices (AC) show sensitivity to auditory spatial manipulations, this sensitivity has typically been considered as evidence for the involvement of these regions in localization as part of the "where" pathway of auditoryprocessing (3). Thus, much research to understand processing in spatially-sensitive AC has centered on determining the code for localization of single sounds (for recent examples see 4,5). However, other evidence suggests that the spatial sensitivity of the AC may reflect the role of space in scene analysis: Activity levels here increase with spatial separation between concurrent sounds more than it does with changes in location of single sounds (6,7).
This might to be due to generalization of the training stimulus as the bat was trained by using a noise stimulus with a centre frequency of 23 kHz. However, the following facts weaken this hypothesis. At the one hand, for a centre frequency of 23 kHz, threshold data was obtained from all experimental animals. Thus, the large data set should result in a reduced standard error in comparison to other centre frequencies. But as one can see in Fig. 1.2, p. 17, the error bars are largest at a centre frequency of 23 kHz which indicates that the interindividual differences in auditory threshold were highest at this frequency. And at the other hand, the previous behavioural audiogram determined by Esser & Daucher (1996) shows a small dip at frequencies around 21 kHz too. Hence, the small dip at a centre frequency of 23 kHz in the present behavioural audiogram seems rather to be an attribute of the audiogram than a result of the training sessions. The largest difference between the behavioural and the neural audiograms is found in the high frequency range above 60 kHz: In the behavioural audiogram the threshold falls up to 80 kHz whereas the neural thresholds rise in both the IC and the AC. This may be caused by a difference in body temperature between the anaesthetized and awake ani- mals. As described by Ohlemiller & Siegel (1994) and Sendowski et al. (2006), a decrease in an animal’s body temperature results in a larger threshold increase for high frequen- cies than for low frequencies. This is further supported by the DPOAE thresholds of P. discolor (Wittekindt et al., 2005) which also show higher thresholds at higher frequen- cies compared to the behavioural audiogram (see Fig. 1.4, p. 21). This study was carried out under the same conditions as the present electrophysiological study (anaesthetized animals, experimental chamber heated to 36 0 C). In consequence, DPOAE thresholds
Instructions and training would be less effortful if there were methods to capture the participants’ perception en passant – in other words, without asking them. This is called a no-report paradigm and has already been established in vision (25,26). Visual no-report methods benefit from the observability of the eyes: Reflexive behaviors such as pupil dilation or the optokinetic nystagmus (OKN) are used to identify the perceptual state of the observer (25,27). For instance, when two different images that cannot be fused are presented, each to one eye, only one of the images is perceived at any given moment, and the perceptually dominant image alternates over time (binocular rivalry (28)). A no-report version of this binocular-rivalry paradigm can be created by having gratings drift in opposite directions – the slow phase of the OKN reliably follows the perceptually dominant grating in this case (25). We combined this visual no-report paradigm with auditory multistability to develop an en
Microsecond differences in the arrival time of a sound at the two ears (interaural time differences, ITDs) are the main cue for localizing low frequency sound sources in space. Traditionally, ITDs are thought to be encoded by an array of coincidence-detector neurons, receiving excitatory inputs from the two ears via axons of variable length (“delay lines”), aligned in a topographic map of azimuthal auditory space. Convincing evidence for the existence of such a map in the mammalian ITD detector, the medial superior olive (MSO) is, however, lacking. Equally undetermined is the role of a temporally glycinergic inhibitory input to MSO neurons. Using in vivo recordings from the MSO of the Mongolian gerbil, the present study showed that the responses of ITD- sensitive neurons are inconsistent with the idea of a topographic map of auditory space. Moreover, whereas the maxima of ITD functions were found to be outside, the steepest slope was positioned in the physiologically encountered range of ITDs. Local iontophoretic application of glycine and its antagonist strychnine revealed that precisely- timed glycinergic inhibition plays a critical role in determining the mechanism of ITD tuning, by shifting the slope into the physiological range of ITDs.
In general, there are two major ERP techniques to estimate mental workload. The first technique uses a primary task (actual experimental subject‟s task). In this approach, the state of attention is assessed directly by measuring the P3 amplitude to certain discrete events embedded in the primary task (Kramer, Wickens, & Donchin, 1985; Nittono, Hamada, & Hori, 2003; Novak, Ritter, & Vaughan, 1992). The second technique represents the “probe” technique to define the neural implementation of a task/cognitive process indirectly. With the aid of the probe technique (presenting additional task irrelevant stimuli that had certain features with the actual task stimuli in common) a participant‟s level of attention or rather the mental workload in auditory materials can be assessed (Papanicolaou & Johnstone, 1984). This latter technique is especially used when there are stimuli in a high frequency like it is the case for continuous speech. A valid analysis of those primary task stimuli would be not given because of overlays of ERP effects from one stimulus to the following. Probe stimuli instead occur less frequent than the task relevant stimuli. Thus, with probes the patterns of regional cerebral activation are assessable in a better way by avoiding overlap effects. Furthermore, task irrelevant probes and task relevant primary stimuli share the same relevant stimulus features. Therefore, probe stimuli are assumed to be processed like task relevant stimuli as well. This fact then allows indirect conclusions from the processing of probes to processing of task relevant stimuli. Moreover, this method assesses cerebral engagement without confounding with stimulus and response-specific activity (Papanicolaou & Johnstone, 1984). Furthermore, the probe technique can be divided into two subtypes: the relevant and irrelevant probe techniques, in which subjects have to ignore or pay attention to probe stimuli.