Unlike in the Phase-I, the Phase-II data collection was done in uncontrolled environments such as laboratories, hostel rooms and corridors etc. Other differences of the Phase-II database compared to Phase-I are that it contains speech data passed through wireless mobile channel and a read/conversational speech recorded in Hindi language from all the speakers. The variabilities present in the Phase-II database are four as listed below.

  • Multi-environment: Speech data were recorded in different environments like hostel rooms and laboratories.
  • Multi-sensor: Speech data were recorded over five different sensors simultaneously.
  • Multi-lingual: Every speaker spoke in three different languages, namely, English, Hindi and his/her favorite language.
  • Multi-style: Every speaker spoke in reading and conversational styles.

Figure-1: Snapshot of recording inside laboratory for Phase II data collection

For collecting the speech data, the same hardware setup as in Phase-I was employed with some differences in operating conditions. For the data recording in this phase, the facilitator called the subject on his/her own mobile phone from a distant place and the speech data was recorded in the mobile phone at the facilitator's end. Another mobile phone operating in offline mode with hands-free microphone attached to the subject at the waist level also was used to record the speech data. The other devices, Tablet PC and DVR were used in the similar fashion as done in Phase-I. The arrangement of devices for recording speech in Phase-II data collection is shown in Figure-3. The subjects for recording included members of the student, staff and faculty community from IIT Guwahati in the age group of 20-40. Read speech data for about 3-5 minutes in English language was initially collected. This was followed by a recording of speech data for about 6-8 minutes in conversation style for both English and favorite language, where later happens to be the mother tongue of the subject in most cases. Finally a read or conversational speech in Hindi language was recorded according to the subject's choice. During the entire recording a facilitator was present to direct the subject and also to converse with him for recording. The second session of recording for each speaker was done after a gap of around one week. Please refer to the IITG DIT MV database documentation for more details.

Figure-2: Snapshot of recording inside hostel room for Phase II data collection

Figure-3: Arrangement of sensors for Phase-II data collection