Audiobox is Meta’s new artificial intelligence model, and is capable of generating sounds from text, and also cloning human voices.
It is no secret that artificial intelligence is a technology that is increasingly used in many areas, and the possibilities offered by generative models such as Dall-E 3 or ChatGPT, the latter being even present in Samsung mobile phones natively.
Now, Meta, the company that owns Facebook WhatsApp and Instagram, has launched a new AI model, but in this case, dedicated to voice cloning and audio generation. In addition, it has several conditions of use so that it cannot be used inappropriately.
This model, called Audiobox, is now available on the company’s website, and promises great fidelity to reality when generating voices and sound effects. To do this, you can mix both voice examples and text prompts provided by the user
This is Audiobox
This artificial intelligence has its origins in Voicebox, the previous model that the company already showed this past summer, and which now arrives in an improved form and with a new name, as well as with more possibilities. Actually, as the company states, it is not a single model, but a family of these.
To create it, more than 160,000 hours of speech have been used, as well as music and sound samples so that it is able to extract and interpret data. Its use can be separated into two main functions: cloning other people’s voices and generating sound effects. For both, you can use a human recording as a model, or create a voice.
Both uses can be combined, offering the user both a voice input and a description of what they want the AI ​​to do, and the results are quite close to reality, although they are not exactly the same as the human voice. It also allows you to generate artificial voices using only the description. Everything can be combined in different audio channels in the same recording, just as if you were in an editing program.
However, it must be taken into account that the company has published this tool for research purposes, which allows it to carry out greater data collection than if it had commercial use. In fact, there are some places in the United States where it cannot be accessed because their laws prohibit video collection, as stated VentureBeat. It should also be noted that, at the moment, it is not an open source application, although in the future they claim that it will be