The generators of images from text They seem to have come to stay. If we want “a painting of a cute black kitten in a cyberpunk-style city” and don’t want to get down to business with pencils or brushes, we can forgo part of the traditional creative process and ask DALL-E, Midjourney, or Stable Diffusion to do it. the work for us.
This is somewhat surprising considering that just a few years ago it was unimaginable. However, advances in the field of artificial intelligence are increasing. A few months ago the first text-based video generators began to appear, and now it’s time to welcome AudioGen, an audio generator. A “DALL-E”, from the sound.
AI surprises us again
AudioGen is an artificial intelligence program that generates sounds from textual descriptions. As explained by the researchers from Meta and the Hebrew University of Jerusalem, who are responsible for the project, an autoregressive generative model is used to interpret requests in natural language and generate audio samples from scratch.
We present “AudioGen: Textually Guided Audio Generation”!
AudioGen is an autoregressive transformer LM that synthesizes general audio conditioned on text (Text-to-Audio).
📖 Paper: https://t.co/XKctRaShN1
🎵 Samples: https://t.co/e7vWmOUfva
💻 Code & models – soon!
— Felix Kreuk (@FelixKreuk) September 30, 2022
Let’s see some examples of AudioGen in action. As we can hear in the Tweet shared by researcher Felix Kreuk, the artificial intelligence program has been able to generate sounds related to “someone whistling while the wind blows”, “a man speaks while birds sing and dogs bark”, ” sirens and a humming engine approach and pass”, among others orders placed in natural language.
According to the researchers, this AI model overcomes complex audio issues. For example, can distinguish between different types of sounds and separate them acoustically. For example, you can filter two people talking at the same time. And it is an elementary feature to be able to generate a wide variety of accurate audio samples.
We don’t know specifically which dataset was used, but members of the project say they trained the model “using ten audio datasets and matching labels.” Let’s remember that many AI models are trained with sets or subsets of data that contain copyrighted creations, which is generating debates in relation to copyright.
It should be noted that the project is still being developed behind closed doors. However, the researchers They intend to make it available to the public. Thus, they will soon publish the AudioGen code and other technical details on their GitHub profile. In addition, they explain, they will continue working to improve the capabilities of the program. We have to wait to find out if it will be available to everyone like the image generators.
Images | Pawel Czerwinsky