🥳Excited to share our latest work: "Diff-A-Riff"! 🥁
A Latent Diffusion Model that generates instrumental accompaniments for any musical input, specifically tailored for music producers! It's faster, lighter, and produces superior audio quality. Control via text/audio references. 48kHz sample rate, (pseudo) stereo, ~3Gb memory, takes 6 seconds to generate 90 seconds of music. Trained on a single GPU.
📜
arxiv.org/pdf/2406.08384
🎶
sonycslparis.github.io/diffa…
🎸 "Diff-A-Riff" adapts to any musical input, following the artist's unique style.
🎛️ Optional controls via text prompts, audio references, interpolation slider, pseudo-stereo width and loop intensity.
🎚️ It produces state-of-the-art audio quality indistinguishable from real data by human raters and operates at unprecedented speed.
🧠 "Diff-A-Riff" is smaller and more efficient than previous models thanks to its Consistency Autoencoder, making it accessible and practical for various applications.
Big shoutout to my outer space colleagues: Javier Nistal, the Machine in "machine learning" 🚄, Marco Pasini, the neural net whisperer 🤫, Cyran Aouameur, the troubleshootah 🛠️, Maarten Grachten, aka MaartenGPT 🤖.
#Teamwork
#AI #MusicTech #Innovation
@latentspaces @marco_ppasini @cyranaouameur @SonyCSLMusic