Türkçe Metinden Konusma Sentezlemede Doğallığın Artırılması İçin Öneriler / Recommendations for Increasing the Naturalness in Turkish Text-to-Speech Synthesis

Baran Uslu; H. Gökhan Ilk; A. Egemen Yılmaz

Türkçe Metinden Konusma Sentezlemede Doğallığın Artırılması İçin Öneriler / Recommendations for Increasing the Naturalness in Turkish Text-to-Speech Synthesis

Authors : Baran Uslu, H. Gökhan Ilk, A. Egemen Yılmaz

Pages : 95-102

View : 10 | Download : 4

Publication Date : 2011-12-01

Article Type : Other

Abstract :Özet Metinden konu s ma sentezleme; yazılı bir metnin geli s tirilen sistem tarafından otomatik olarak okunmasıdır. Bu çalı s mada, difon tabanlı, eklemeli bir konu s ma sentezleyici tasarlanmı s ve gerçekle s tirilmi s tir. Birle s tirmede PSOLA yöntemi kullanılmaktadır. Genellikle konu s ma sentezleyicilerin ezgi modeli yoktur veya eksiktir. Bu durum sentezlenen konu s manın do g allı g ını olumsuz yönde etkiler. Çalı s mamızda bu eksikli g in giderilmesi için yeni bir model önerilmi s tir. Sentezlenen konu s manın do g allı g ının artırılması için, konu s manın ezgisi üzerinde süre ve vurgu temelli kurallar tanımlanmı s tır. Bu kurallar, hazırlanan ara yüzde yapılan pek çok denemenin sonucunda bulunmu s tur. Uygulanan kuralların sentezlerin do g allı g ındaki ba s arısı öznel dinleme testleriyle ölçülmü s tür. Sonuç olarak, tanımlanan kuralların geli s tirilen konu s ma sentezleyicide uygulanması ile CMOS testi sonucunda 1,86/5,00 puanlık bir artı s elde edilmi s tir. Bu sonuç, ezgi modelimizin ba s arılı oldu g unu göstermektedir. Abstract Text to speech synthesis (TTS) is the automatic reading of a text by a system. In this work, a TTS system which concatenates diphones has been designed and implemented. For concatenations, PSOLA method was used. Usually speech synthesizers lack an intonation model. This degrades the naturalness of the synthesized speech. For increasing the naturalness of the synthesized speech, duration and accent based rules were defined in this study for a proper intonation. These rules were determined after an extensive set of experiments performed in the designed testbed. In the end, an improvement of 1.86/5.00 in the CMOS score was obtained by applying the defined rules in the developed synthesis platform. This result shows the success of our intonation model.
Keywords : Metinden konusma sentezleme, difon, PSOLA, ezgi modeli, dogallık, CMOS, Text to speech synthesis (TTS), diphone, intonation model, naturalnes

ORIGINAL ARTICLE URL

VIEW PAPER (PDF)