MusicGen: a new way to generate music from existing text and melodies

robot compone musica.jpg
robot compone musica.jpg

In the world of artificial intelligence, advances in music generation have been increasingly impressive.

A new model called MusicGen, developed by Meta, is gaining attention for the ease it offers to create music from text prompts and existing melodies.

MusicGen, Meta’s proposal to generate music with AI

Similar to other current systems, MusicGen is based on a Transformer model. While a language model predicts the next characters in a sentence, MusicGen has the ability to predict the next section in a piece of music. This means that you can generate new pieces of music based on text prompts.

Meta researchers use the EnCodec audio tokenizer to break audio data into smaller components. MusicGen is a single-stage model that processes these tokens in parallel, making it a fast and efficient option for generating music.

To train MusicGen, the team used a dataset of 20,000 hours of licensed music. This dataset included 10,000 high-quality music tracks, as well as music data from Shutterstock and Pond5. This approach to training with a wide variety of sources contributes to the diversity and quality of the compositions generated by MusicGen.

MusicGen’s unique ability to handle text and music

One of MusicGen’s distinctive features is its ability to process both text prompts and existing melodies. The text provides the basic style, which is then aligned with the melody in the audio file. For example, if a text message describing an 80s pop track is combined with the melody of Bach’s famous “Toccata and Fugue in D Minor”, MusicGen can generate a new piece of music based on these indications. This and similar examples can be found in the Hugging Face demo of this tool, where users can experiment with MusicGen’s music generation capabilities using their own parameters and audio samples as well.

It is important to note that MusicGen does not allow precise control over the orientation of the melody. Although the text sets the basic style, it is not exactly reflected in the generated output. However, it is still a rough guide to music generation, and it delivers interesting results.

Compared to other popular models like Riffusion, Mousai, MusicLM, and Noise2Music, MusicGen shows superior performance in both objective and subjective metrics. The music generated by MusicGen matches the lyrics better and has a higher plausibility in composition. Overall, based on performance metrics presented by The Decoder, it ranks above the level of Google’s MusicLM.

In addition to the demo available on Huggingface mentioned above, Meta has released both the MusicGen code and models as open source on Github, allowing for commercial use.

Previous articleWhatsapp: news for Channels and keyboard in the latest betas for Android and iOS
Next articleWhat is Facebook Dating and how it works
Brian Adam
Professional Blogger, V logger, traveler and explorer of new horizons.