Πλοήγηση ανά Συγγραφέα "Matrapazis, Anastasios"

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Α Β Γ Δ Ε Ζ Η Θ Ι Κ Λ Μ Ν Ξ Ο Π Ρ Σ Τ Υ Φ Χ Ψ Ω

Τώρα δείχνει 1 - 1 από 1

Greek text-to-speech
(2021-12-04) Matrapazis, Anastasios; Ματραπάζης, Αναστάσιος; Athens University of Economics and Business, Department of Informatics; Vassalos, Vasilios; Malakasiotis, Prodromos; Androutsopoulos, Ion
Text-to-Speech (TTS) is a technology able to read aloud digital text. Nowadays, there has been significant progress in many applications, from virtual assistance and customer services to technologies that help people who struggle with reading. This study aims to train and evaluate a TTS deep learning model in the Greek language trying to copy the voice of a well-known Greek actress keeping the naturalness of the output speech. Our goal is to produce the actress's accent, making the output as identifiable as possible. Recent research has shown how TTS can be successfully addressed as a sequence-to-sequence (seq2seq) task followed by a vocoder. The seq2seq model predicts Mel-Spectrograms, a representation of the input in the frequency domain per time frame, given a text. Having a Mel-Spectrogram layout, the vocoder model synthesizes the time-domain waveform. This study is focused on training the auto-regressive Tacotron 2 implementation for the seq2seq task and the WaveGlow model for the vocoder. With the objective of simulating the actress’s voice, we collected our data samples from a podcast she hosted. Having the audio samples, we organized the dataset in the form of <text-audio sample> pairs. After the training process, our model achieved 3.48 MOS (Mean Opinion Score). Our Greek TTS model also accomplished 81% voice similarity with the original podcast hostess.