PhDScanner

PhDScanner

Users
Tweets

PhDScanner

@PhdScanner

Jun 10

🎙️ PhD Opportunity in AI & Audio Signal Processing 🇫🇷 Exciting PhD position at Inria focusing on speech enhancement using distributed microphone arrays, combining acoustics and machine learning. 📍 Location: Strasbourg (Inria Centre – Université de Lorraine) 💰 Salary: €2,300 gross/month 📅 Deadline: July 10, 2026 ⏳ Duration: 3 years 👨‍🏫 Supervisor: Antoine Deleforge 🔬 Project Overview This PhD is part of the French-German ANR-DFG AWESOME project (2026–2029), aiming to dramatically improve speech quality by leveraging distributed (ad-hoc) microphone arrays in real-world environments. The research combines inverse acoustics and cutting-edge machine learning (including diffusion models) to handle dynamic, noisy, and reverberant conditions. 🎯 Key Research Directions: • Microphone array self-localization & calibration • Acoustic scene understanding using reflections • Sound field interpolation for dynamic environments • Multichannel speech enhancement & dereverberation • Diffusion-based generative models for acoustics 💻 What You’ll Do: • Develop algorithms using Python / PyTorch • Conduct experiments and collect acoustic data • Publish research and present at international conferences 👤 Ideal Candidate: • Master’s in ML, signal processing, CS, acoustics, or applied math • Strong Python skills (PyTorch is a plus) • Background in deep learning & signal processing • Interest in audio, acoustics, and research 🌟 Why Apply? • Work at a leading European AI research institute • Interdisciplinary project bridging ML & acoustics • Generous benefits (7 weeks leave, flexible work, training) • International collaboration and strong research exposure 🎧 If you're passionate about AI, sound, and real-world applications like AR/VR, smart devices, and hearing tech—this is a fantastic opportunity! 👉 Apply here: phdscanner.com/opportunities… #PhD #MachineLearning #SignalProcessing #AudioAI #DeepLearning #Inria #ResearchJobs #AI #SpeechProcessing

PhD Position F/M PhD Thesis: Speech Enhancement with Distributed Microphone Arrays by Combining...

PhD position in AI for enhancing speech signals through machine learning and inverse methods. Requires proficiency in Python and strong interest in deep learning and acoustics.

phdscanner.com

670

Komal

Komal

@komalpreet2809

May 28

Made VANTA, A neural target speaker extraction system I’ve been building to isolate one specific voice from the messiest audio recordings. Live demo: vanta.komalpreet.me Code: github.com/Komalpreet2809/Va… Current audio separation tools are a blunt instrument. When dealing with audio files, users often struggle with: • Overlapping voices • Loud background chatter • Complex room acoustics • Standard noise cancellation tools that blindly suppress "noise" • Systems that don't know who to focus on when multiple people are speaking Most AI audio tools either act like a black box, aggressively muffle everything, or leave the target speaker sounding metallic and robotic. I wanted to build something different. Vanta is an informed audio separator. Instead of guessing what to suppress, it uses a 5-second reference clip of your target speaker to learn their exact voice fingerprint. It then scans the messy mixture and extracts only that person, returning a crystal-clear track of their voice, plus a residue track of everything it removed. What it can do: • Ingest a 5-second reference voice fingerprint • Isolate the target speaker from highly noisy mixtures • Mask out interfering voices (even at similar volumes) • Preserve the natural phase of the audio (no STFT/robotic artifacts) • Generate a residue track of the removed noise/speakers • Operate robustly across different simulated room environments The main principle behind the project is: More signal. More informed extraction. Zero metallic artifacts. Less blind noise cancellation. The Tech stack: • PyTorch for the core ML architecture and training • Time-domain 1-D Convolutions to avoid spectrogram artifacts • Frozen ECAPA-TDNN (VoxCeleb) for robust voice fingerprinting • Temporal Convolutional Networks (TCN) with speaker conditioning • FastAPI for the backend API • Next.js Tailwind for the frontend shell • Hugging Face Spaces & Vercel for deployment One of the biggest goals is audio purity. Your isolated audio shouldn't sound like it's trapped in a tin can. • Time-domain architecture: Operates directly on the raw audio waveform. • SI-SDR optimization: Maximizes waveform purity over volume differences. • Continuous conditioning: Voice fingerprint injected at every block to never lose the target speaker. • Explainable separation: Outputs a separate residue track so you can hear exactly what was removed. If you’re an ML engineer, audio researcher, developer, or someone who has felt the pain of noisy recordings and overlapping voices, I’d love your feedback, ideas, issues, PRs, or even just a star ⭐ #OpenSource #MachineLearning #DeepLearning #AudioAI #SpeechProcessing #PyTorch #FastAPI #NextJS #SpeechSeparation #TargetSpeakerExtraction

1:16

131

NLP_UMUTeam

NLP_UMUTeam @NLP_umuteam

May 4

🚨 New Paper Alert! speech-emotion: a multilingual & multimodal toolkit for emotion recognition 🎙️💬 ✅ Combines audio text → better performance than unimodal models ✅ Supports Spanish 🇪🇸 & English 🇬🇧 SoftwareX (Elsevier) sciencedirect.com/science/ar… #NLPoc #SpeechProcessing

176

Divya Sharma

Divya Sharma @vdivyasharma

Apr 29

🎓 PhD Thesis Defended (Apr 27, 2026) Green & Inclusive Speech Processing (bias, sustainability, Indian languages) 🏆 ACL’25 (Outstanding Paper), NAACL’24/’22, LREC’26 🙏 Grateful to @DrAnubhaGupta 🔬 Open to academic opportunities #SpeechProcessing #ResponsibleAI #IIITD

147

Ajay Bhakar

Ajay Bhakar

@ajay_2512x

Mar 21

🚨 Opportunity: Junior Research Fellow (JRF) in Speech & Language Processing 🎙️🧠 We are looking for motivated candidates to join as a Junior Research Fellow (JRF) under an ANRF-funded project at IIIT Dharwad, in the area of Speech and Language Processing. This position is ideal for individuals interested in working on cutting-edge research involving AI/ML for speech and language domains. 🔹 Key Details: Duration: 3 years (fully funded) Stipend: ₹37,000 per month 16% HRA (as per latest ANRF norms) Location: IIIT Dharwad Research areas: Speech processing, language models, AI/ML Opportunity to work on impactful and publishable research 🎓 Academic Opportunities: Candidates interested in pursuing MTech or PhD under this project are strongly encouraged to apply. This position can be aligned with your higher studies. 🔹 Who should apply? Strong background in machine learning / deep learning Experience or interest in speech and language technologies Motivated to pursue research and publications 📩 How to apply: Interested candidates can reach out with their CV and a brief statement of interest at nataraj@iiitdwd.ac.in Please share this with anyone who might be interested. #JRF #PhD #MTech #SpeechProcessing #NLP #AI #ResearchOpportunity #ANRF

851

Biswesh Mohapatra

Biswesh Mohapatra @bis1602

Jan 29

🎉 Happy to share that "SpeechMapper: Speech-to-text Embedding Projector for LLMs" has been accepted to #ICASSP2026! This work was done during my internship at NAVER LABS Europe with Marcely Zanon Boito and Ioan Calapodescu. #MultimodalAI #SpeechLM #SpeechProcessing Thread 1/4

Yohei Kawaguchi

Yohei Kawaguchi @yohekawag

Jan 19

Three papers from our unit have been accepted to ICASSP 2026! See you in Barcelona for discussions! #ICASSP2026 #ICASSP #SignalProcessing #AI #SpeechProcessing

1,725

yash

yash

@yashvarma_in

20 Dec 2025

My first paper, “Attention Enhanced Speaker Representation with Contrastive Learning,” is now published at an IEEE Conference ieeexplore.ieee.org/document… #FirstPublication #IEEE #MachineLearning #SpeechProcessing #ResearchJourney

Attention Enhanced Speaker Representation with Contrastive Learning

Conventional neural text-to-speech (TTS) systems frequently employ recurrent neural network (RNN)-based encoders, such as Long Short-Term Memory (LSTM) or Gated Recurrent Units (GRU), to extract...

ieeexplore.ieee.org

DSP

DSP @uvic_DSP

7 Nov 2025

3/3 Balancing reduces false negatives, vital for screening. Congrats to @XavierSanc2433, PhD student and first author, for his hard work. #MentalHealth #SpeechProcessing #EMD #IMF #MachineLearning

SIRT BHOPAL

SIRT BHOPAL @SIRT_BHOPAL

17 Oct 2025

Huge congratulations to Dr. Shalini Sahay, Professor in the EC Department at SIRT, on publishing her book, "Optimization Analysis of Speech Processing Based Alzheimer's Disease." #BookPublished #AlzheimersResearch #SpeechProcessing #SIRT #ECEngineering

Mirco Ravanelli

Mirco Ravanelli @mirco_ravanelli

24 Sep 2025

Tomorrow (Sept 25, 11:00–12:00 EST), our #ConversationalAI Reading Group hosts @Themos Stafylakis (Athens Univ. & Omilia). Talk: Advances in Speaker Recognition: Pruning, Deepfake Detection & Learning w/o Temporal Labels Info: poonehmousavi.github.io/rg.h… #AI #SpeechProcessing

405

IIIT Delhi

IIIT Delhi

@IIITDelhi

1 Aug 2025

We are proud to share that the paper “IndicSynth: A Large-Scale Multilingual Synthetic Speech Dataset for Low-Resource Indian Languages” from @SBILabIIITD has received the Outstanding Paper Award at ACL 2025 (@aclmeeting), one of the most prestigious conferences in computational linguistics and natural language processing. IndicSynth introduces 4000 hours of synthetic speech from 989 target speakers, including 456 females and 533 males, across 12 Indian languages to facilitate multilingual audio deepfake detection and anti-spoofing research. This recognition is the result of the dedicated work of PhD scholar Divya Sharma, who was guided by Prof. @DrAnubhaGupta. Divya’s technical rigour, clarity of thought, and confident presentation at ACL were central to the success of this work. Her presentation at ACL and engagement during the Q&A demonstrated the calibre of a confident and capable NLP researcher. We also acknowledge the valuable contributions of undergraduate student Vijval Ekbote, whose support strengthened the project. Congratulations to the entire team at SBILab for this important recognition and for driving impactful research in Indian language technologies. #SBILab #IIITD #NLProc #ACL2025NLP #SpeechProcessing #MachineLearning #MultilingualAI #SyntheticSpeech #DeepfakeDetection #ACL2025 #ResearchExcellence

833

MT Group at FBK

MT Group at FBK @fbk_mt

9 Jul 2025

Our pick of the week by @FBKZhihangXie: "Adversarial Speech-Text Pre-Training for Speech Translation" by Chenxuan Liu, Liping Chen, Weitai Zhang, Xiaoxi Li, Peiwang Tang, Mingjia Yu, Sreyan Ghosh, and Zhongyi Ye (ICASSP 2025) #speech #speechprocessing #speechtech #translation

Zhihang Xie @FBKZhihangXie

9 Jul 2025

🚀 AdvST: Adversarial training aligns speech and text distributions without parallel data! Combines adversarial learning hidden-state swapping to fix length mismatch & boost low-resource speech translation. ieeexplore.ieee.org/document…

215

TCS Research

TCS Research

@TCSResearch

4 Jul 2025

TCS Research is pleased to be a Silver Sponsor of the Summer School on Speech Signal Processing(S4P). This program offers an in-depth exploration of speech technology and automatic speech recognition. Register here- bit.ly/3G7sfer #TCSResearch #S4P2025 #SpeechProcessing #AI

178

ELOQUENCE AI

ELOQUENCE AI @eloquenceai

23 Jun 2025

📢 The Jelinek Summer Workshop on Speech and Language Technology (JSALT 2025) starts today! 👉 More info: eloquenceai.eu/event/jelinek… #ELOQUENCEAI #SpeechProcessing #SpeechTechnology #Workshop

Ehsan Eqlimi

Ehsan Eqlimi @ehsan_eqlimi

14 May 2025

Excited to share our new paper in B-ENT Journal on how the brain responds to speech across different modalities (audiovisual, auditory, visual) using fNIRS! Explore the localization of cortical responses in normal-hearing adults #fNIRS #SpeechProcessing #Neuroimaging

423

MT Group at FBK

MT Group at FBK @fbk_mt

9 Apr 2025

Our pick of the week by @FBKZhihangXie: "Bridging Speech and Text Foundation Models with ReShape Attention" by @TakatomoKano, @chenwanch1, @shinjiw_at_cmu, et al. (#ICASSP2025) ieeexplore.ieee.org/document… #Speech #FoundationModel #SpeechProcessing

Bridging Speech and Text Foundation Models with ReShape Attention

This paper investigates cascade approaches bridging speech and text foundation models (FMs) for speech translation (ST). We address the limitations of cascade systems which suffer from the propagat...

ieeexplore.ieee.org

Zhihang Xie @FBKZhihangXie

9 Apr 2025

ReShape Attention bridges speech & text models without extra parameters. Achieves 8.5% BLEU in translation by leveraging acoustic cues, outperforming cascade/E2E methods. Efficient & scalable. Check the paper by Kano et al. (2025) at: ieeexplore.ieee.org/stamp/st….

682

Mervin Praison

Mervin Praison

@MervinPraison

24 Mar 2025

AI Decodes Brain’s Speech & Language Processing" 🧠🎙️🚀 ✅ Brain processes speech & language in a unified hierarchy 🔄 ✅ Whisper ECoG reveal real-time neural mapping 🎙️📊 ✅ AI accurately predicts brain activity across regions 🤖🔬 ✅ Game-changer for neuroscience & AI! 🚀 🔽 Details in the thread below! #NeuroAI #SpeechProcessing #BrainMapping

808

ILCB

ILCB @ILCB_france

4 Mar 2025

🧠🗣️ New research in eLife explores speech coordination & brain dynamics using intracranial recordings! Read more: doi.org/10.7554/eLife.99547.… #Neuroscience #SpeechProcessing #BrainDynamics

240

IIT Dharwad

IIT Dharwad

@iitdhrwd

7 Feb 2025

IIT Dharwad is currently inviting applications for a range of project positions in the Speech Processing Lab led by Prof. Mahadeva Prasanna. This opportunity is an ideal for those interested in progressing their careers in Speech and AI, and acquiring practical experience in innovative projects. For further details on the available positions and how to apply, please visit : iitdh.ac.in/other-recruitmen… #IITDharwad #Hiring #AI #SpeechProcessing

187