Course Code: CS566
Course Name: Speech Processing
Prerequisites: NIL
Syllabus: Introduction to Speech Processing, human and machine speech production:Models for speech production. Various types of speech sounds and their characteristics, Speech hearing: Mechanism for human hearing: Learning to recognize human sounds, acquired knowledge vs vocabulary - based methods. Analysis of speech: Frequency and time domain based methods: FFT, computation of pitch, spectrograms, LPC, cepstrum, ZCR, etc. Representation of acoustic events. Components of a Speech recognition system: Input, feature analysis, modelling and decision rule, vocabulary. Data compression: Vector Quantization, codebook design, Lloyd's quantizer design, K-means algorithm, LBG algorithm for speech. Speech modelling: Stochastic processes: Markov processes, Hidden Markov modelling. Components of an HMM, training and building of HMMs: Viterbi algorithm, Baum-Welch algorithm, etc. Implementation of a speech recognition system: Time/space consideration, designing the interface, self-learning mechanism.
Texts: 1. L. Rabiner and B. H. Juang, Fundamentals of Speech Recognition, Prentice Hall, 1993.
2. L. Rabiner and R. W. Schafer, Digital Processing of Speech Signals, Prentice Hall, 1978.
3. K. Sayood, Introduction to Data Compression, 2nd Ed, Morgan Kaufmann, 2000.
References: 1. D. O'Shaughnessy, Speech Communications: Human and Machine, 2nd Ed, IEEE Press, 2000.
2. A. Gersho and R. M. Gray, Vector Quantization and Signal Compression, Kluwer Academic, 1991.
3. Selected research papers.