What is the name of the voice generation system. Online speech synthesizers: the best services for voicing text

The program is designed to be read aloud text files. To reproduce the sounds of a human voice, any speech synthesizers installed on a computer can be used. Playback...

3 month ago License: Free Language: Russian English German OS: XP/Vista/7/8/8.1/10 Size: 16.99 MB

a powerful tool for reading text from a web page, letter, text file, various documents or converting it into MP3 or WMA audio files. The program integrates into such...

4 months ago License: Shareware Language: English OS: XP/Vista/7/8/8.1/10 Size: 24.1 MB

interesting application, which is capable of converting printed text into audio speech. This is convenient for creating various audio lessons, lectures, or even entire books, when you can just...

6 months ago License: Free Language: Russian English OS: XP/Vista/7/8/8.1/10 Size: 3.72 MB

speaking text editor. SAPI4 and SAPI5 compatible speech synthesizer. The program is designed to read text files aloud using Microsoft Speech API 4/5 (SA...

A year ago License: Free Language: Russian OS: XP/Vista/7/8/8.1/10 Size: 3.59 MB

With 2nd Speech Center, text can be listened to instead of being read from the screen, thus resting the eyes. The text is spoken from the clipboard. It is possible to record spoken text to mp3/wav....

2 years ago License: Shareware Language: English OS: 2000/XP/2003/Vista/7/8/8.1/10 Size: 3.77 MB

comfortable and quick program for voicing texts, creating audio books (in wav, mp3, amr, aac + playlist format), placing stresses in Russian texts, or simply reading books conveniently from the screen. Chrome...

4 years ago License: Shareware Language: Russian OS: XP/Vista/7 Size: 2.59 MB

free program to convert text to speech. It will help to read any text aloud, as well as save it to a WAV or MP3 file. TTSReader already has about 10 male and female voices. Etc...

8 years ago License: Free Language: English OS: 2000/XP/Vista Size: 2.29 MB

Using Pistonsoft Text to Speech Converter you can convert text to speech or audiobook in MP3 and WAV format. All languages installed in the system are available in the program for voicing. Technologies...

Synthesis of oral speech is the transformation of previously unknown textual information into speech. The speech output of information is the implementation of the speech interface, to simplify the use of the system. In fact, thanks to speech synthesis, another data transmission channel from the computer is provided, mobile phone to a person, similar to a monitor. Of course, it is impossible to convey the drawing by voice, but listen to it email or the schedule for the day in some cases is quite convenient, especially if at that time the eye is busy with something else. For example, when you come to work in the morning, preparing for negotiations, you could straighten your tie or hair in front of the mirror, while the computer reads aloud last news, mail or reminds important information for negotiations.

Figure 2.2 - Acoustic signal processing

Speech synthesis technology has found wide application for people with vision problems. For everyone else, it creates a new dimension of ease of use of technology and significantly reduces the load on vision, on the nervous system, and allows you to use auditory memory.

Figure 2.3 - Speech synthesis

Any text consists of words separated by spaces and punctuation marks. The pronunciation of words depends on their location in the sentence, and the intonation of the phrase depends on the punctuation marks. Finally, pronunciation also depends on the meaning of the word! Accordingly, in order for the synthesized speech to sound natural, it is necessary to solve a whole range of tasks related both to ensuring the naturalness of the voice at the level of smoothness of sound and intonation, and with the correct placement of stresses, decoding abbreviations, numbers, abbreviations and special characters, taking into account the peculiarities of Russian grammar. language.

There are several approaches to solving the tasks:

1) allophone synthesis systems - provide a stable, but not sufficiently natural, robotic sound;

2) systems based on the Unit Selection approach - provide a much more natural sound, but may contain fragments of speech with sharp quality dips, up to loss of intelligibility;

3) hybrid technology based on the Unit Selection approach and supplemented with units of allophone synthesis.

Based on this technology, the VitalVoice system was created, which provides a stable and natural sound at an acoustic level.

Speech communication is natural and convenient for a person. The task of speech recognition is to remove the intermediary in the communication between a person and a computer. Controlling the machine with real-time voice, as well as entering information through human speech, will greatly simplify the life of a modern person. To teach a machine to understand without an intermediary the language that people speak to each other is the task of speech recognition.

Scientists and engineers have been solving the problem of verbal communication between man and machine for many years. The first speech recognition device appeared in 1952, it could recognize the numbers spoken by a person. Commercial speech recognition programs appeared in the early nineties.

All speech recognition systems can be divided into two classes:

1) Speaker-dependent systems - tuned to the speaker's speech in the learning process. To work with another speaker, such systems require a complete reconfiguration.

Figure 2.4 - Speech recognition

2) Speaker-independent systems - the operation of which does not depend on the speaker. Such systems do not require prior training and are able to recognize the speech of any speaker.

Initially, systems of the first type appeared on the market. In them, the sound image of the command was stored in the form of a holistic standard. To compare an unknown pronunciation and a standard command, methods were used dynamic programming. These systems worked well in recognizing small sets of 10-30 commands and understood only one speaker. To work with a different speaker, these systems required a complete reconfiguration.

In order to understand continuous speech, it was necessary to switch to dictionaries of much larger sizes, from several tens to hundreds of thousands of words. The methods used in systems of the first kind were not suitable for solving this problem, since it is simply impossible to create standards for such a number of words.

In addition, there was a desire to make the system independent of the announcer. This is a very difficult task, since each person has an individual manner of pronunciation: the pace of speech, the timbre of the voice, and pronunciation features. Such differences are called speech variability. To take it into account, new statistical methods have been proposed, based mainly on the mathematical apparatus of Hidden Markov Models (HMMs) or Artificial neural networks. Instead of creating standards for each word, standards are created for the individual sounds that make up words, the so-called acoustic models. Acoustic models are formed by statistical processing of large speech databases containing speech recordings of hundreds of people.

IN existing systems Speech recognition uses two fundamentally different approaches:

Recognition of lexical

Note that the creation of speech recognition systems is an extremely difficult task.

To date, a technology is called that is capable of converting textual information into ordinary speech. With the development of "smart machines", this technology is becoming more and more relevant, and every day requires more and more perfection. Actually, on this moment A number of speech synthesis methods have been developed, which we will talk about.

Speech synthesizers can be used in completely different areas, and are used to solve a variety of tasks, ranging from the "recitation" of books, the production of "talking" children's toys, the announcement of stops in public transport or in service systems, and ending with medicine (here it is worth remembering Stephen Hawking, who uses a speech synthesizer to communicate with the world).

So, let's take a closer look at the technology and methods of speech synthesis. As already mentioned, there are several methods of speech synthesis. Thus, there are several main approaches:

parametric synthesis;
concatenative (compilation) synthesis;
synthesis according to the rules (according to the printed text);

Parametric synthesis allows you to record speech for any language, but it cannot be used for texts that are not predefined. Parametric speech synthesis is used when the set of messages is limited. The quality of such a synthesis method can be very high.

Essentially, parametric speech synthesis is an implementation of how a vocoder works. In the case of parametric synthesis sound signal represented by a certain number of continuously changing parameters. To form vowels, a tone generator is used, for consonants, a noise generator is used. But this method is usually used to record voice in musical compositions, and more often it is not even about pure voice synthesis, but rather about modulation.

The compilation synthesis method is based on the compilation of texts from a pre-recorded "dictionary" of elements. The size of the system element must be at least a word. Typically, the stock of elements is limited to several hundred words, and the content of the synthesized texts is limited to the volume of the dictionary. This method of speech synthesis is widely used in everyday life - as a rule, in various information services and technology that requires equipment with voice response systems.

Full speech synthesis according to the rules can reproduce speech from a previously unknown text. This method does not use elements of human speech, but is based on programmed linguistic and acoustic algorithms.

There is also a division here - two approaches to this synthesis method can be distinguished. The first is formant speech synthesis according to the rules, and the second is articulatory synthesis. Formant synthesis is based on formants - frequency resonances of speech speaker system. The formant synthesis algorithm models the work of the human vocal tract, which works as a set of resonators. Today, unfortunately, most synthesizers that work exclusively on formant synthesis are difficult to understand without preparation, but, undoubtedly, this is a universal and promising technology. The articulatory method tries to improve the shortcomings of the formant method by adding phonetic features of the pronunciation of individual sounds to the model.

There is also a rule-based speech synthesis technology that uses recorded segments of natural speech. Since compilation methods are still most often used, let's say a few words about them in more detail.

Depending on how large the "excerpts" of speech used for synthesis are, the following types of synthesis are distinguished:

microsegment (microwave);
allophonic;
diphonic;
semi-syllable;
syllabic;
synthesis from units of arbitrary size.

The most commonly used are allophonic and diphonic methods. For the diphonic method of speech synthesis basic elements are all kinds of binomial combinations of phonemes, and for allophone - combinations of left and right contexts (an allophone is a variant of a phoneme, which is due to its specific phonetic environment). Wherein Various types contexts are combined into classes according to the degree of acoustic proximity.

The advantage of such systems is that they make it possible to synthesize a text from a text that is not predetermined, and the disadvantage is that the quality of the synthesized speech is incomparable with the quality of natural speech (distortions may occur at the boundaries of the stitching of elements). It is also very difficult to control the intonational characteristics of speech, since the characteristics of individual words can change depending on the context or type of phrase.

However, this is all in theory. In practice, at the present stage of development, despite the active progress in this area, the developers of speech synthesis technology still experience some difficulties, mainly related to the artificiality of the synthesized speech, the lack of emotional coloring in it and low noise immunity.

The fact is that any synthesized speech, as a rule, is perceived by a person with difficulty. This is due to the fact that the human brain fills the gaps in the synthesized text, which uses additional resources for this, and a person can normally perceive synthesized speech for only about 20 minutes.

The perception of speech is also affected by its emotional coloring. In the case of synthesized speech, it is absent. Although it is worth noting that some algorithms still allow to some extent to imitate the emotional coloring of speech by changing the duration of phonemes, pauses and timbre modulation, but so far their work is far from ideal.

As for the third named problem - low noise immunity, experiments show that any, even the smallest, extraneous noise interferes with the perception of the synthesized text. This is again due to the fact that in order to process synthesized speech, the human brain uses additional centers that are not used in the perception of natural speech.

At the end of this article, I would like to give some examples of existing speech synthesizers.

Everyone knows the so-called "readers" - programs for more convenient reading of text from the monitor. Many of the Nakh use speech synthesis programs to read text, such as Balabolka and Govorilka.

In order for such programs to voice texts, you must also install the SAPI (Speech API) library and voice engines. The two most common versions of the Speech API are SAPI4 and SAPI5. Both libraries can run on the same computer. IN operating systems windows xp, Windows Vista and Windows 7 already have the SAPI5 libraries installed.

In addition to readers, screen readers are common. Examples of such programs are:

VIRGO 4 . The program was created for the comfortable work of blind and visually impaired users with Windows. It allows you to select the information that will be spoken and the information that will be shown on the Braille display. For visually impaired users, the Galileo screen magnification system is provided.

Cobra 9.1 also makes Windows easier for blind and visually impaired users. This program can output information from a computer monitor using speech, braille display and has a screen magnification function.

Today, speech synthesizers used in stationary computer systems or mobile devices, they no longer seem unusual. Technology has stepped far ahead and made it possible to reproduce the human voice. How it all works, where it is applied, what is the best speech synthesizer and what potential problems the user may encounter, see below.

What are speech synthesizers and where are they used?

speech synthesizers are special programs, consisting of several modules that allow you to translate the text typed on the keyboard into ordinary human speech in the form of sound.

It would be naive to believe that the accompanying libraries contain absolutely all the words or possible phrases recorded in studios by real people. It's just physically impossible. In addition, phrase libraries would be of such a size that it would simply not be possible to install them even on modern large-capacity hard drives, not to mention mobile devices.

For this, a technology was developed, called Text-to-Speech (text-to-speech translation).

Speech synthesizers are most widely used in several areas, which include independent study foreign languages (programs often support 50 languages or more), you need to hear the correct pronunciation of a word, listening to texts of books instead of reading, creating speech and vocal parts in music, using them by people with disabilities, issuing search queries in the form of voiced words and phrases, etc.

Varieties of programs

Depending on the area of application, all programs can be divided into two main types: standard ones that directly convert text to speech, and speech or vocal modules used in music applications.

For a more complete understanding of the picture, we will consider both classes, but more emphasis will still be placed on speech synthesizers in their direct purpose.

Pros and Cons of Simple Speech Applications

As for the advantages and disadvantages of programs of this type, let's first consider the disadvantages.

First of all, you need to clearly understand that a computer is a computer that is on this stage development, human speech can be synthesized very approximately. In the simplest programs, there are often problems with the placement of stresses in words, reduced sound quality, and in mobile devices - increased power consumption, and sometimes unauthorized loading of speech modules.

But there are enough advantages, because so many sound information perceived much better than visual. Ease of perception is evident.

How to use a speech synthesizer?

Now a few words about the basic principles of using programs of this type. You can install a speech synthesizer of any type without any problems. IN stationary systems a standard installer is used, where the main task will be to select supported language modules. For mobile devices, the installation file can be downloaded from the official store or repository like Google Play or AppStore, after which the application is installed automatically.

As a rule, at the first start, no settings, except for setting the default language, need to be made. True, sometimes the program may offer to select the sound quality (in the standard version, which is used everywhere, the sampling rate is 4410 Hz, the depth is 16 bits and the bit rate is 128 kbps). In mobile devices, these figures are lower. Nevertheless, a certain voice is taken as the basis. By using a standard pronunciation template, applying filters and equalizers, the sound of just such a timbre is achieved.

In use, you can select several options for manually translating, voicing already existing text from a file, integrating into other applications (for example, web browsers) with the activation of search results or reading text content on online pages. Enough to choose desired option actions, the language and the voice in which all this will be pronounced. Many programs have several types of voices: both male and female. The start button is usually used to activate the playback process.

If we talk about how to turn off the speech synthesizer, there may be several options. In the simplest case, the playback stop button in the program itself is used. In the case of browser integration, deactivation is performed in the extension settings or complete removal plugin. But with mobile devices, despite the immediate shutdown, there may be problems that will be discussed separately.

IN music programs settings and text entry is much more complicated. For example, the FL Studio application has its own speech module, in which you can choose to slightly change the settings for tonality, playback speed, etc. The character “_” is used to set stress before a syllable. But even such a synthesizer is only suitable for creating robotic voices.

But Yamaha's Vocaloid package belongs to the professional type. The Text-to-Speech technology is implemented here to the fullest extent. In settings, in addition to standard parameters, you can set articulation, glissando, use libraries with vocals from professional performers, compose words and phrases, adjusting them to notes, and a lot more. It is not surprising that a package with only one vocal takes up about 4 GB or more in the installation distribution, and after unpacking - twice or three times as much.

Speech synthesizers with Russian voices: a brief overview of the most popular

But back to the most simple applications and consider the most popular of them.

RHVoice - according to most experts, the best speech synthesizer, which is a Russian development of authorship. Three voices are available in the standard version (Alexander, Irina, Elena). The settings are simple. And the application itself can be used both as a standalone SAPI5-compliant program and as a screen module.

Acapela is quite an interesting application, main feature which is an almost perfect voice acting of the text in more than 30 languages of the world. In the regular version, however, only one voice is available (Alena).

Vocalizer is a powerful application with the female voice of Milena. Very often this program is used in call-centers. There are many settings for setting stress, volume, reading speed and installing additional dictionaries. The main difference is that the speech engine can be built into programs like Cool Reader, Moon+ Reader Pro or Full Screen Caller ID.

festival- powerful utility speech synthesis and recognition system, created for Linux and Mac OS X systems. The application comes with an open source code and, in addition to standard language packs, even has support for Finnish and Hindi.

eSpeak is a speech application that supports over 50 languages. The main disadvantage is the saving of files with synthesized speech exclusively in the WAV format, which takes up a lot of space. But the program is cross-platform and can be used even in mobile systems.

Problems with the speech synthesizer in Google Android

When installing a "native" speech synthesizer from Google, users constantly complain that it spontaneously turns on the download of additional language modules, which can not only take a fairly long period of time, but also consumes traffic.

Getting rid of this on Android systems can be very simple. To do this, use the settings menu, then go to the language and voice input section, select voice search and on the offline speech recognition parameter, click on the cross (disable). Additionally, it is recommended to clear the application cache and restart the device. Sometimes it may be necessary to turn off notifications in the application itself.

What is the result?

Summarizing a certain result, we can say that in most cases the most simple programs. RHVoice is the leader in all ratings. But for musicians who want to achieve a natural sounding voice, so that the difference between live vocals and computer synthesis is not felt by ear, it is better to give preference to programs like Vocaloid, especially since many additional voice libraries are released for them, and the settings have so many possibilities that primitive applications, as they say, did not stand nearby.

Speech synthesizer programs are becoming more and more part of our lives every year. They allow us to learn foreign languages more thoroughly, translate texts into a convenient audio format, are used in the functionality of various utility programs, and much more. And when some of us need to reproduce some text online in audio format, then many of us turn to various services and speech synthesis programs that can help us transform the text we need there. In this article I will talk about network versions of such products, describe what an online speech synthesizer is, what online speech synthesis services exist, and how to use them.

The best online speech synthesizers

Initially, speech synthesizers were developed for visually impaired people to reproduce text using a computer voice. But gradually, their advantages were appreciated by a mass audience, and now almost anyone can download a speech synthesizer on a PC, or use the alternatives that are present in some versions of operating systems.

So which online speech synthesizer can you choose? Below I will list a number of services that allow you to play text-to-speech online.

Ivona is a great synthesizer

Voice engines of this online service very different high quality, a good phonetic base, sound quite natural and the “metallic” computer voice is felt much less often here than in competing services.

The Ivona service has support for many languages, in the Russian version there is a male voice (Maxim) and a female voice (Tatyana).

To use the speech synthesizer, log in to this resource, on the left there will be a window in which you will need to insert text to read.
Insert the text, click on the button with the name of the person, select the language (Russian) and the pronunciation option (female or male) and click on the "Play" button.

Unfortunately, the free functionality of the site is limited to a sentence of 250 characters, and is intended more for demonstrating the capabilities of the service than for serious work with text. Great opportunities can be obtained only for a fee.

https://youtu.be/TIbx4pxX6Gk

Acapela - speech recognition service

The company that sells its voice engines for various technical solutions, invites you to use the Acapela speech synthesizer online. Although the prosody of this service is not as high as that of Ivona, nevertheless, the quality of pronunciation here is also very good. The Acapela resource supports about 100 voices in 34 languages.

To use the functionality of the resource, open the specified service, select Russian in the window on the left (Select a language - Russian).
Insert the desired text below and click on the “Listen” button (listen).

The maximum text size for audio reading is 300 characters.

Fromtexttospeech - online service

You can also use the fromtexttospeech service to translate text to speech online. It works on the principle of converting text into an mp3 audio file, which you can then download to your computer. The service supports text conversion of 50,000 characters, which is quite a large amount.

To work with the fromtexttospeech service, go to it, in the "Select Language" option, select "Russian" (there is only one voice here - Valentina).
In a large window, enter (paste) the text you need for voice acting, then click on the "Create Audio File" button.
The text will be processed, then you can listen to the result, and then download it to your PC.
To do this, right-click on "Download audio file" and select "Save target as" from the menu that appears.

Google Translate can also be used

The well-known Google translator online has a built-in text-to-speech function, and the amount of text read here can be very voluminous.

To work with it, log in to this service (here).
Select the Russian language in the window on the left, and click on the button with the speaker below "Listen".

The playback quality is at a fairly tolerable level, but no more.

Text-to-speech - speech synthesizer online

Another resource that performs speech synthesis of normal quality. Free functionality is limited to 1000 character typing.

To work with the service, go to this site, in the window on the right, next to the "Language" option (language), select Russian.
In the window, type (or copy from an external source) the required text, and then click on the button on the right "Say It".
A link to the pronunciation of the specified text can also be placed in your e-mail or web page by clicking on the "Yes" button just below.

Alternative PC software for text-to-speech

There are also speech synthesis programs such as TextSpeechPro AudioBookMaker, ESpeak, Voice Reader 15, VOICE and a number of others that can convert text to speech. They need to be downloaded and installed on your computer, and the functionality and capabilities of these products usually slightly exceed the capabilities of the considered online services. Their detailed description deserves a separate extensive material.

Conclusion

So which speech synthesizer online to choose? In most of them free features are significantly limited, and in terms of sound quality, the Ivona service will leave its competitors behind. If you are interested in the possibility quick transfer of your text into an audio file, then use the "fromtexttospeech" resource - it gives the result good quality and in a fairly short time.