TACOS: Temporally-aligned Audio CaptiOnS for Language-Audio Pretraining
Learning to associate audio with textual descriptions is valuable for a range of tasks, including pretraining, zero-shot classification, audio retrieval, audio captioning, and text-conditioned...
arxiv.org