People from different geographical locations have different styles of speaking. Accent recognition is the task of determining the geographical region to which the speaker belongs. The speaker’s speaking style may be variable due to various factors such as the dialect of the speaker and its socio-economic background. Modeling of such variations for efficient identification of the accent of a speaker is a challenging task. Now speech recognition systems are used for more and more applications and it is crucial for such systems to be able to deal with the accents of the speaker as these systems are used world wide. Problem of accent recognition differs from speaker recognition, in which speaker is identified based on voice biometrics. It is also different from speech recognition in which speech is converted to text form. In short, speech recognition refers to what has been spoken, speaker recognition refers to who has spoken it and accent recognition refers to a distinctive way of pronouncing of a particular speaker.
i-vectors show the biggest promise in the probabilistic modelling domain for speaker recognition. Initially introduced for speaker recognition, i-vectors have become very popular in the field of speech processing and they are also reliable for text-dependent speaker verification language recognition and speaker diarization.
The work has been done in this field by an M Tech student, Manasi Sant (View Abstract).
Accent recognition is the task of determining the geographical region to which the speaker belongs. Variations in a speakers speaking style may be introduced through various factors such as the dialect of the speaker and their socio-economic background. Modeling such variations to efficiently identify the accent of a speaker is a challenging task. Approaches tried by the researchers to attack the challenge of accent recognition are -Cross-entropies of formants and cepstrum features,HMM, MFCC based speaker recognition algorithm, phoneme based accent classification, GMM based clustering approach, etc.
In this work, the i-vector representation is used for performing accent recognition of speakers. I-vector is an improvement over Joint Factor Analysis (JFA). JFA separates speaker and channel components. But I-vector treats these components together. Compression is achieved by eliminating non-effective dimensions. 250 speakers from the TED-LIUM database are used for this experiment. All experiments were performed on ALIZE toolkit. Encouraging F1 score of up to 0.9126 is obtained for accent recognition experiment performed for 250 speakers. It is also observed that accent recognition performed better than straightforward speaker recognition task. While analyzing the whole experiment, special set of 10 speakers are identified. Out of these 10 speakers, 5 had highest accuracy and 5 had lowest accuracy. Further, this work tries to improve accuracy for this special set of speakers. Experiments were carried out with LPCC features and achieved accuracy of 0.8497 which was previously 0.8021 for this set. Concept of ’Confidence Score’ is introduced to boost the accuracy further till 0.9165