Google introduced MusicLM, a text-to-music AI that creates songs that can last up to five minutes. The researchers have released a paper detailing their work and findings, showcasing various examples of AI-generated songs resembling the text prompts given. The team behind MusicLM claims that their model “outperforms previous systems both in audio quality and adherence to the text description”.
The Challenges of AI Music Generation
MusicLM is a hierarchical sequence-to-sequence model for music generation that uses machine learning to generate sequences for different song levels, such as structure, melody, and individual sounds.
MusicLM: A Hierarchical Sequence-to-Sequence Model
The model was trained on a large dataset of unlabeled music and a music caption dataset of over 5,500 examples created by musicians, which have been made publicly available to support future research. In addition to its text-based input, MusicLM also allows for audio input, such as whistling or humming, to inform the song’s melody. The AI then “renders the style described by the text prompt”.
Despite its promising results, the paper’s authors acknowledge the risks of potential misuse of generated songs if they do not differ enough from the source material used in training the model. Google’s MusicLM is a significant step forward in AI music generation. It tackles some of the challenges faced by previous systems, such as the need for paired audio and text data and the difficulties in capturing the intent of a music track through a text caption.