Department Of Electrical and Computer Engineering

Welcome to Speech and Signal Processing Laboratory

For the past few years we have been collaborating with auditory scientists (from the University of Amsterdam and Eaton-Peabody Lab for Auditory Physiology (MIT/Harvard)) in learning about how the auditory system processes acoustic signals such as speech, encodes them and makes inferences on them. Our goal is to identify those aspects of auditory processing that are responsible for its superiority over current artificial implementations and to emulate the practically useful ones in a computer.

The biggest barrier to widespread use of automatic speech recognition(ASR) systems in real-life situations is their unreliable performance in background noise and interference. In marked contrast to current artificial systems, human listeners are able to correctly identify speech utterances in many acoustically-challenging contexts. Humans also do remarkably well in separating out individual voices from those of other speakers and from acoustic clutter of all sorts (cocktail party effect). How are we  able to do this? Examination of auditory perception and the neurophysiological basis suggests to us that this difference is due to powerful sound separation mechanisms coupled with robust spectro-temporal representations of signals used by the auditory system.

Currently, every speech-recognition system that engineers have built uses framewise feature vectors. The feature vectors are derived from short-term spectral envelopes computed by standard spectral analysis or by using a bank of fixed bandpass filters (BPFs). When speech is degraded by noise, interference, and channel effects (such as   telephone, reverberation etc.,) perturbations at one frequency affect the entire feature vector rendering the extracted features vulnerable. This type of framewise spectral envelope extraction that models the speech and interference together, is at odds with how the auditory system processes and recognizes speech. In the auditory system, sound components are spectrally and temporally separated, analyzed and subsequently fused into unified objects, streams and voices that exhibit perceptual attributes, such as pitch, timbre, loudness, and location.

We propose to develop methods and algorithms to process complex acoustic signals observed by one or more acoustic sensors. The long term goal is to develop a machine that can deal with the day-to-day booming, buzzing acoustic environment around us and make inferences on the sounds the way human beings and animals are able to do.  Current signal analysis methods are inadequate for this purpose.  Since the auditory system provides an existence proof of such a system it seems reasonable to use it as an inspiration for our strategy.  However, our algorithm development is anchored in fundamental signal processing principles.

The major aims of our current research are as follows:

Recent Publications

  • "Adaptive Filterbanks for Speech Feature Extraction Inspired by the Auditory System", Ramdas Kumaresan, Gopi Krishna Allu, Peter Cariani. Accepted for publication by the International Conference on Acoustics, Speech and Signal Processing, March 2005, Philadelphia, PA.

  • "Decomposition of a bandpass signal and its applications to speech processing" Ramdas Kumaresan, Gopi Krishna Allu, Jayaganesh Swaminathan and Yadong Wang , pp.2078-2082, 37-th Asilomar Conference on Signals, Systems and Computers, CA, Nov.2003.

  • "Average Instantaneous frequencies and average log-envelopes for ASR With the Aurora 2 Database" Yadong Wang, Jesse Hansen, Gopi Krishna Allu and Ramdas Kumaresan, pp.21-25, Proc. Eurospeech 2003, Geneva, Switzerland.

  • "On representing signals using only timing information",Ramdas Kumaresan and Yadong Wang, J. Acoust. Soc. Am., November 2002 -- Volume 110, Issue5, pp. 2421-2439.

  • "On the relationship between line spectrum pairs and zero-crossings of band-pass signals", Ramdas Kumaresan and Yadong Wang, IEEE Trans. on Speech and Audio Processing, Volume 9, # 4, pp.458-461, May 2001.

  • "On Decomposing Speech into Modulated Components", Ashwin Rao and Ramdas Kumaresan, IEEE Tran. on Speech & Audio Proc., Vol.8, No.3, pp.240-254, May 2000

  • "On Minimum/Maximum/All-Pass Decompositions in Time and Frequency Domains", Ramdas Kumaresan, IEEE Tran. on Signal. Proc. Vol.48, No.10, pp.2973-2976, Oct. 2000

  • "An Inverse Signal Approach to Computing the Envelope of a Real Valued Signal", Ramdas Kumaresan, IEEE Signal Processing Letters, Vol.5, No.10, p256-259, Oct 1998

  • "Model-Based Approach to Envelope and Positive Instantaneous Frequency (PIF) of Signals with Speech Applications", Ramdas Kumaresan and Ashwin Rao, Journal of the Acoustical Society of America, Vol.105, No.3, pp.1912-1924, March 1999

  • "Unique positive FM-AM decomposition of signals'', Ashwin Rao and Ramdas Kumaresan, Multidimensional Systems and Signal Processing, Vol.9, pp.411-418, 1998

  • "A parametric modeling approach to Hilbert transformation", Ashwin Rao and Ramdas Kumaresan, IEEE Signal Processing Letters, Vol.5, pp.15-17, 1998

Acknowledgements

This research was supported by grants from the National Science Foundation under grant number EIA-0130793 and CCR-0105499


Email: Dr. Kumaresan kumar@ele.uri.edu for comments.
Page last updated: September 10, 2009