Background

Meta Open Sources AudioCraft, A Generative AI That Can Create Music From Text Prompts

Meta AI

The AI world was rather dull and quiet, and rarely made ripples that affect industries other than their own.

That, until OpenAI introduced ChatGPT, the phenomenal generative AI that both awed and wowed the tech world and the internet.

Since then, it's an arms race, as more and more companies compete and become rivals, and as more generative AI products are introduced to the market.

Meta is one of the competitors, and one of the largest of its kind.

While it's kind of late into the competition, it does come fully prepared.

In this case, Meta has both LLaMA and LLaMA 2.

Whereas the former is created for researchers, the latter is meant to disrupt the market ChatGPT and others are on, because Meta has made it open source.

Mark Zuckerberg, founder and CEO of Meta calls it the 'AudioCraft', and as explained by Meta in a website post, it's essentially a specialized generative AI for audio:

"Imagine a professional musician being able to explore new compositions without having to play a single note on an instrument. Or a small business owner adding a soundtrack to their latest video ad on Instagram with ease. That’s the promise of AudioCraft — our latest AI tool that generates high-quality, realistic audio and music from text."

AudioCraft consists of three models: MusicGen, AudioGen and EnCodec. MusicGen.

According to Meta, MusicGen was trained specifically on licensed music, and capable of generating music from text prompts, while AudioGen, was trained on public sound effects, and is capable of generating audio from text prompts.

Meta is open-sourcing these AI models, so researchers and practitioners can use them to train their own models with their own datasets for the first time.

This in turn, should help advance "the field of AI-generated audio and music."

"Today, we’re excited to release an improved version of our EnCodec decoder, which allows higher quality music generation with fewer artifacts. We’re also releasing our pre-trained AudioGen models, which let you generate environmental sounds and sound effects like a dog barking, cars honking, or footsteps on a wooden floor. And lastly, we’re sharing all of the AudioCraft model weights and code."
Meta AudioCraft

In other words, Meta is making the specialized generative AI for audio, available to all.

The world has seen lots of advancements in the field of AI, especially around generative AI products for images, videos, and text. But according to Meta, audio is lagging because it's highly complicated and not very open.

Not that many people are able to readily play with it.

Making things more difficult, music is arguably the most challenging type of audio to generate.

Meta announced the AudioCraft family as AI models capable of producing high-quality audio with long-term consistency, and that "they’re easy to use."

This is because Meta simplifies the overall design of generative models for audio compared to prior work in the field. What's more, AudioCraft works for music, sound, compression, and generation - all in the same place.

Published: 
02/08/2023