Human SCINT Seminar (17)
Event Date: 2006-03-22 13:00
| Date: 2006.3.22 (Wed) 13:00-14:15
Place: General Research Building, Room 663
Speaker: Akihiro Tanaka
Title: Experimental psychological approach to audio-visual speech recognition and its application to welfare engineering
Keywords: audio-visual speech recognition, multimodal perception, time asynchrony, experimental psychology, cognitive aging
Affiliation: Department of Psychology, Graduate School of Humanities and Sociology
Position: Research Associate
Disciplines: Cognitive psychology, Cognitive Neuroscience
Societies and Conferences: Japanese Psychological Association, Acoustical Society of Japan, Japanese Cognitive Science Society, Japanese Society for Cognitive Psychology, Japanese Psychonomic Society, International Conference on Cognitive Science, Acoustical Society of America
Bibliography: Akihiro Tanaka, Experimental psychological approach to audio-visual speech recognition and its application to welfare engineering, Human Science Integration Seminar Abstracts, No. 17, pp. 1, 2006.
(Please use this bibliography when you cite this abstract.)
Abstracts:
Lipreading is an example of human multimodal activity, which serves everyday spoken language conversation. We investigated the influence of relative timing and presentation rate between talker's moving image and voice on word intelligibility. We used 20 minimal pairs of 4-mora-word. Words were presented under visual-only, auditory-only, and auditory-visual (AV) conditions. Effect of AV asynchrony by time-expanded speech on AV benefit was compared with that by timing shift. Results showed that AV asynchrony by timing shift was recalibrated while AV asynchrony by time-expansion was not. These results suggest that recalibration of audio-visual simultaneity requires constant timing difference between talker's moving image and voice. These results might be useful to design a multimodal speech-rate conversion system.
References:
[1] Sekuler R, Sekuler AB, & Lau R (1997) Nature, 385, 308.
[2] Shams L, Kamitani Y, & Shimojo S (2000) Nature, 408, 788.
[3] Sumby WH & Pollack I (1954) J Acoust Soc Am, 26, 212-215.
[4] McGurk H and MacDonald J (1976) Nature, 264, 746-748.
[5] McGrath M & Summerfield AQ (1985) J Acoust Soc Am, 77, 678-685.
[6] Pandey PC, Kunov H, Abel SM (1986) J Aud Res, 26, 27-41.
[7] Grant KW Braida LD & Renn RJ (1994) J Acoust Soc Am, 95, 1065-1073.
[8] Grant KW & Greenberg S (2003) Proc AVSP, 132-137.
[9] Fujisaki W, Shimojo S, Kashino M, & Nishida S (2004) NatNeurosci,
7, 773-778.
[10] Kawahara H, Masuda-Katsuse I, & de Cheveigne A (1999) Speech
Commun, 27, 187-207.
[11] Tanaka A, Sakamoto S, & Suzuki Y (2005) J Acoust Soc Am, 117, 2604. |
|