Speaker diarization is hard.
Real conversations are messy: cross-talk, short speech turns, backchannels (âyeahâ, âuh-huhâ), interruptions, and an unknown number of speakers all make the problem challenging.
But STREAMING speaker diarization is a completely different beast.