🚀 We're excited to announce our latest work: "Discrete Audio Tokens: More Than a Survey!"
It presents a comprehensive survey and benchmark of audio tokenizers across speech, music, and general audio.
preprint: arxiv.org/pdf/2506.10274
website: poonehmousavi.github.io/date…
Proud of this work published a few days ago at SLT 2024 on continual learning for end to end ASR. Turns out changing the CL paradigm to parallel training on different tasks and merging these experts can reduce the forgetting rate to as low as 0.4%! poonehmousavi.github.io/asse…
Join us for the Conversational AI Reading Group! 📚 We meet every Thursday, 11-12 AM EST, to discuss the latest advancements in conversational AI, multimodal models, and speech processing. Everyone is welcome! More info: poonehmousavi.github.io/rg.h… & follow us on Twitter: @convAI2024
SpeechBrain version 1.0.2 is now out! My personal contribution is a clean adapters interface that allows custom adapters or integration with PEFT layers, your choice. You can see the tutorial here:
speechbrain.readthedocs.io/e…
We just released v1.0.1 of SpeechBrain with some cool updates to Whisper integration: various tasks supported, fine-tuning fixes, performance improvements, and more!
📢 I'll be presenting our paper "How Should We Extract Discrete Audio Tokens from Self-Supervised Models?" at InterSpeech! 🎙️
Meet us at the Speech Processing Using Discrete Speech Units, Oral Session on Sep 3, 16:20.
🔗 Paper: arxiv.org/abs/2406.10735#INTERSPEECH2024
ALT L-MAC is a posthoc explanation method that helps us to hear why an audio classifier makes its decisions. It takes the hidden representations of a pretrained classifier and feeds them to a decoder to predict saliency maps.
We will have XAI-SA: Explainable Machine Learning for Speech and Audio, next week at ICASSP 2024. The date is April 15.
You can sign-up for it here to receive more information for it:
forms.gle/VPBP3Mojq3EwqwU77
Workshop website for the schedule:
xai-sa-workshop.github.io/
For a deep, thoughtful discussion of proliferation, regulation, and why open source is the better—and safer—path to take with AI, I HIGHLY recommend this piece by @jeremyphoward. I learned much from it and encourage others to as well: fast.ai/posts/2023-11-07-dis…
How in the world Effective Altruists went from supporting data-driven charitable impact like mosquito nets to organizing protests against open source is beyond me. This is no long altruism—this is ideological posturing that hurts, not helps, society. spectrum.ieee.org/meta-ai
Fresh paper out #EMNLP2023
LLMs excel in zero-shot text-to-SQL but still benefit greatly from in-domain demonstrations. This work is driven by two questions: (1) What are the key factors within in-domain examples? (2) Can we harness these benefits without in-domain annotations?
Happy to share the last work in my PhD is accepted to #EMNLP2023 findings. Many thanks to my advisor @EricFos We propose a new framework to select text-to-SQL demonstration examples from out-of-domain data and synthetic in-domain data. The paper will be released next week!
Excited to present at the SpeechBrain online summit on Monday 28th Aug
I'll be joined by @shinjiw_at_cmu, @functiontelechym, Daniel Povey, and Zhaoheng Ni for a panel discussion on open-source speech
It's not too late to register if you haven't already: speechbrain.github.io/sb_sum…
Excited to share the news about a new internship opportunity at @Mila_Quebec, where you can contribute to @SpeechBrain1!
We're seeking students skilled in RNN-T. If you have relevant experience, please apply at speechbrainproject@gmail.com.
#ICASSP2023#DeepLearning#AI