Experiment seeks to convert the human voice into that of a dog, for audiovisual purposes

0
34
perro.jpg
perro.jpg

Speech conversion is a sound processing process in which the source speaker’s speech waveform is converted to a different speech waveform, with the characteristics of the target speaker, but retaining the linguistic information.

Under this dynamic, a team of scientists from Japan set out to develop a system that allows them to adapt human voice recordings as if they were statements made by a dog.

“Dogs talking like humans”, thanks to a vocal conversion process

A joint investigation, carried out by professionals from the Faculty of Information Sciences and Engineering of the Ritsumeikan University and the communication sciences laboratories of NTT Corporation, both from Japan, set out to investigate the possibility of “making people talk”. to dogs”, through processes of vocal conversion.

To familiarize yourself with the principle behind this model, you can take as an example the use of the vocoder in electronic music. Bands like kraftwerk either daft-punk, used this resource to create a “robotic voice” effect, processing the natural voice of their singers with the electronic tone emitted by a synthesizer. As if it were an average between both variables, an electronic sound is obtained, but respecting the original vocalization.

In the paper documenting this study, a mechanism is presented that, from vocal samples of a dog, it is possible to process a fragment of human voice to make it sound in the tonality of the animal, with the purpose of using this as an audiovisual resource, to the dubbing of dogs in movies or video games.

example

Diagram that summarizes the operation of the vocal converter presented. Source: arxiv.org

The processing of the voice in this case does not only consist of a “filter” that combines both variables to obtain a result. Under a dynamic reduced to just that, the results could be extremely rustic. In this case, the processing system has a “real or false discriminator”, which is responsible for evaluating how credible the reproduction obtained is, based on the analysis of factors such as the level of similarity that the product obtained has with the sound of a dog, the quality of the sound and if it is possible to achieve the necessary clarity to distinguish what is heard in the generated audio.

The first tests carried out, with different conversion methods, presented mainly positive results for the first two criteria. However, the greatest challenges are concentrated in the level of clarity, since not even the minimum expected threshold was reached.

If the experiments last, the purpose of “making a dog talk” that has not yet been fully achieved, could be realized, thus allowing a new level of fantasy to be generated in cinematographic experiences or in the world of video games.