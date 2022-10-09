Artificial intelligence has been being put at the service of different sectors. One of them has been the artistic one, in which we have seen how this technology is being used in the development of tools to generate images from text such as DALL-E and Stable Diffusion, which greatly facilitate the traditional creative process.

Also, artificial intelligence has allowed the creation of video-generating tools from text.

But everything between artificial intelligence and multimedia does not end there, since the launch of a tool that would be something like the DALL-E of sound was recently announced.

Is about AudioGen, an audio generator which does its work from textual descriptions.

In this sense, a team made up of researchers from Meta and the Hebrew University of Jerusalem explained that AudioGen works from a generative autoregressive modelwhich is responsible for interpreting the user’s textual requests to generate the final audio.

We present “AudioGen: Textually Guided Audio Generation”! AudioGen is an autoregressive transformer LM that synthesizes general audio conditioned on text (Text-to-Audio). 📖 Paper: https://t.co/XKctRaShN1

🎵 Samples: https://t.co/e7vWmOUfva

💻 Code & models – soon! (1/n) pic.twitter.com/UiJaA627bv — Felix Kreuk (@FelixKreuk) September 30, 2022

On his Twitter account, the researcher Felix Kreuk published a tweet of an audio generated with AudioGen, in which a series of sounds can be heard, such as the whistle of a person while the wind blows, a person speaking at the same time that birds singing and dogs barking are heard, among other sounds generated as a result of the texts introduced in AudioGen.

The researchers responsible for AudioGen assured that this tool has been designed in such a way that it is capable of overcoming the difficulties that may be present in the generation of audio. This allows AudioGen to be able to recognize different types of sounds and isolate them acoustically.

This means that for audio where two people are talking at the same time, AudioGen would be able to fetch each other’s audio separately, which is quite a useful feature of this tool for accurate audio samples.

For the training of this tool, the team pointed out that ten audio data sets and matching labels were used.

It is necessary to clarify that this project is still under development so the public will have to wait for access, although they will soon have the opportunity to access the AudioGen code and other details on their GitHub profile.

Added to this, they mentioned that they will continue to work on AudioGen to improve its capabilities.