Computable Minds -

How works the voice synthesizers?

Posted on: May, 22nd 2011
Virtual singer Miku Hatsume
The voice synthesizers are those programs as Loquendo, which provide of human voice to the machine. Although the majority of them have a voice very robotic, without intonation and feelings, but we can understand it. Here you see roughly how is imitated the human voice in a computer and because is so difficult achieve a voice equal to the human one.

In first place we have to learn how works the system that we want imitate, our voice. The sound are pressure waves that are propagated by the air, due to that the molecules that form it collide ones with others. To produce sound when we talk, our lungs eject the air that have inside, this air goes by the trachea to the pharynx, where there are the vocal cords. Really these names only serve to confuse us, the trachea is a "tube", the pharynx is the final piece where the "tube" widens and it joins with the "tube" that comes from the stomach, and the vocal cords are two muscle folds of the larynx. Then comes the Glottis, a recess that makes vibrate the "tube" at different frequencies and intensities according to the mass, longitude and tension of the vocal cords in that instant.

Then, the sound bounces by cavities of our vocal tract, doing that the interior form of our mouth, the larynx, the form of the tongue, the teeth, the lips, the nose, etc., generating a different sound in every one. It is easy check that if we cover our nose our voice sounds different.

The vocals are obtained with a combination of the movements of the mouth and the tongue with every different kind of vibration of the vocal cords, and when we move the mouth and the tongue and expelled the air without produce any kind of vibration of the vocal cords, we are producing consonants.

From here we know how the voice works, but... How is imitated this in a computer? In a computer the sound is generated from the speakers making vibrate a membrane that is moved by a magnet. The intensity and speed with that the magnet has to be moved are give by an electrical signal that comes from the computer, so only we have to generated the appropriate signal. To do that, first we must know the sound that we want to produce, converting a text in phonemes. After, to every phoneme, we generate a periodical signal if we want generated a vowel. If we want generated a consonant, we will have to generated a noise signal. Following, we pass it through a model that imitates the resonances of the vocal tract and, at last, by other model that imitate the effect that produce the medium in that the sound wave expands. When we arrive to the following phoneme we will try of do the change gradually, to achieve a realistic voice. After all this processing, the signal that since know only was information, is converted in an electrical signal that is sent to the speakers.

All this processing is quite complex, especially create a model of the mouth that be realistic. The best synthesizers have been made doing scanners of the vocal tract of humans. Added to this, is the difficulty of imitate the emotions that we transmit with the voice and give the adequate intonation when we read a book. This is a task that isn't yet achieved. There are to investigate more about how are the intonations that we do when we are angry, depressed or cheerful, moreover there are to accomplish that the machine understands the meaning of the text, for that can derive the intonation that have to give according to the context.

The Japanese have been achieved with very sophisticated synthesizers, voices of singers so realistic using the trick of do that the synthetic voice follows the music. Something similar are do it to the voices of the singers that don't know sing to make that they follow the melody, passing their voice by a software as Autotune. The two more famous Japanese voice synthesizer are Vocaloid, which plays the voice of several virtual singers as Miku Hatsume, and Vocalistener, from that have been created a robot with human appearance, although this one will not be famous because it is inside of the uncanny valley. Next you can watch a pair of videos of these synthesizers:

Watch on Youtube a concert where sing the hologram of Miku Hatsume

Watch on Youtube the presentation of Vocalistener in the CEATEC 2010 (Combined Exhibition of Advanced Technologies)

Comments (0): Comment
Categories: ,


Copy and paste in your page:

How about you!? Don't give your opinion?

Replying to the next comments:

To check if you are human answer the question correctly:

I don't like this question, change it!

None of these data will be stored.

(Write the e-mail)

Required field.

(Write the e-mail or several e-mails separated by coma)

Required field.

To check if you are human answer the question correctly:

I don't like this question, change it!

Daiatron on Google+