Mizo TTS Audio Samples

Side-by-side comparison of real speech and TTS outputs. Audio is provided for perceptual evaluation only.

First three rows are samples of three best VITS outputs and the last three rows are of three worst.

Sentence Natural Speech Tacotron2 VITS
MZ00056-5: "He hmalakna hi hmeichhe naupangte tan phei chuan a á¹­angkai hle dawn a ni" a ti.
MOS: 4.31
MOS: 2.69
MOS: 4.26
MZ00053-17: Hei hi hri darh zel tur tih tawpna tur a nih thu an sawi.
MOS: 3.69
MOS: 2.86
MOS: 4.19
MZ00058-22: Hei pawh hi chin fel ran theih thuai an inbeisei tih an sawi.
MOS: 4.15
MOS: 2.81
MOS: 4.00
MZ00060-2: Midangte tih damna hmanrua atan rei tak chhung lung a lo ei tawh.
MOS: 4.06
MOS: 2.46
MOS: 2.68
MZ000113-13: A titu a puh zinga pakhat, Ayaz Saiyed phei chuan a chhuitute chu a pui zui nghe nghe.
MOS: 4.03
MOS: 2.23
MOS: 2.69
MZ000115-9: "Pathianin he damdawiin hi mal a sawmin a awmpui tih a chiang a ni" a ti.
MOS: 4.26
MOS: 2.32
MOS: 2.71

Note: MOS values indicate mean listener ratings for the corresponding samples. Audio files are provided solely for academic evaluation and are not downloadable.