Orchestral Pit

Orchestral Pit

Users
Tweets

Jun 9

Fine-grained timbre and tempo control in video-to-audio synthesis via dual encoders. Disentangle style attributes without sacrificing semantic alignment. #Video2Audio #AudioGeneration #Timbre arxiv.org/abs/2606.07182v1

Audio Imitator: Controlling Timbre and Tempo in Video2Audio...

Video-to-audio generation has made significant progress in achieving semantic consistency and temporal alignment from silent videos. However, audio contains rich stylistic attributes such as...

arxiv.org

Audio and Speech Processing Papers

Audio and Speech Processing Papers @AudioAndSpeech

Jun 8

Audio Imitator: Controlling Timbre and Tempo in Video2Audio Synthesis with Audio Reference Jiahui Zhao, Tianrui Wang, Chunyu Qiang, Cheng Gong, Xijuan Zeng, Feng Deng, Longbiao Wang arxiv.org/abs/2606.07182 [𝚎𝚎𝚜𝚜.𝙰𝚂]

Audio Imitator: Controlling Timbre and Tempo in Video2Audio...

Video-to-audio generation has made significant progress in achieving semantic consistency and temporal alignment from silent videos. However, audio contains rich stylistic attributes such as...

arxiv.org

128

arXiv Sound

arXiv Sound @ArxivSound

Jun 8

Jiahui Zhao, Tianrui Wang, Chunyu Qiang, Cheng Gong, Xijuan Zeng, Feng Deng, Longbiao Wang, "Audio Imitator: Controlling Timbre and Tempo in Video2Audio Synthesis with Audio Reference," arxiv.org/abs/2606.07182

Audio Imitator: Controlling Timbre and Tempo in Video2Audio...

Video-to-audio generation has made significant progress in achieving semantic consistency and temporal alignment from silent videos. However, audio contains rich stylistic attributes such as...

arxiv.org

359

ノマ扉 | nomadoor

ノマ扉 | nomadoor @noma_door

Jan 17

既存のLTX-2 workflowのパラメータの見直し、新たなworkflow (Multi-frame I2V / video2audio / Temporal inpainting) の追加をしました comfyui.nomadoor.net/ja/basi… これで完成と思うたびに新たな情報が出てきたりして中々更新できませんでした…スイマセン(；･`ω･´) 楽しんでいただければ！！

0:25

7,447

Brie Wensleydale🧀🐭

Brie Wensleydale🧀🐭

@SlipperyGem

9 Sep 2025

Kijai has decided to update MMAudio. Its an older video2audio model in the same vein as HunYuan Foley. I still want to go take a look at HunYuan Foley again, I feel like that model is probably better than MMAudio, although MMAudio might be faster. github.com/kijai/ComfyUI-MMA…

GitHub - kijai/ComfyUI-MMAudio

Contribute to kijai/ComfyUI-MMAudio development by creating an account on GitHub.

github.com

147

@levelsio

@levelsio

6 Sep 2025

🇮🇩 Kali Besar in Batavia in the Dutch East Indies in 1938 (current day Jakarta, Indonesia) Input: original photo -> img2img with Nano Banana "restore old photo" -> then img2video with Kling 2.1 Master with ChatGPT-generated video prompt -> then video2audio with MMAudio for background sounds

0:08

@levelsio

30 Aug 2025

I think we'll have years of fun being able to finally interpret somewhat accurately the imperfect drawings people in human history made And show how it actually would have looked if we had cameras then! This is nano-banana with map input prompt: "full colour photograph. New Amsterdam in 1660. make sure it's full modern colors as if it's a photograph taken today." This is old Dutch fortress of New Amsterdam, in 2025 the place in New York City where the National Museum of the American Indian is built on exactly that location!

136

69,850

bone

bone

@boneGPT

6 Mar 2025

LIVE playing PIRATE YAKUZA and shooting the shit about BONEPAINT BONECOIN BONEDUPS and VIDEO2AUDIO x.com/i/broadcasts/1vAGRDljP…

Untitled

589

bone

bone

@boneGPT

6 Mar 2025

Used Video-to-Audio on the left clip to make her kitty purr. I am adding this to bonedub We recently added MICROPHONE to bonedub so you can sing on your videos or do voices. Will do official announcement of the Microphone tool soon after some bug fixes. Checkout this Video2Audio generation. This will be free to people making bonepaints.

0:05

961

Shiqi Yang

Shiqi Yang

@shiqi_yang_147

4 Mar 2025

Looking for 1 intern on audio-visual generation (potentially video2audio generation)! We have the largest computation resources in Japan, and we do serious industrial research (and development). DM if interested, and you can find more about me in my homepage.

5,955

AiRT🎥生成AI動画を創る人

AiRT🎥生成AI動画を創る人

@AutoIntelliMode

25 Feb 2025

#DreamMachine のVideo2Audioを試してみました。ちゃんと場面を理解して効果音を足してくれるの良いですね！🥰

0:17

341

Sheetal Shinde

Sheetal Shinde

@itsSSS1510

24 Feb 2025

Say hello to Video2Audio in #DreamMachine 🚀 @LumaLabsAI @gravicle thanks for adding magic to storytelling🚀Your innovation is transforming the way we create🎬 🔊One-Click Sound Generation 🎼Customizable Prompt 🔥Beta Access for All Users Hit the "Audio" button & create magic!🌟

0:36

1,068

Brent Lynch

Brent Lynch

@BrentLynch

24 Feb 2025

💥💥VIDEO2AUDIO LIVE FOR FREE IN LUMA💥💥 For those who have an addiction to adding MMAUDIO to everything they possibly can this works along the lines of that but in my testing over the weekend this delivers a generally noticably higher level of audio quality. Free while in Beta. So get those awesome crash landings going! @LumaLabsAI #dreammachine #Ray2

0:05

Luma

@LumaLabsAI

24 Feb 2025

Video to Audio is now here in #DreamMachine. To generate sound for your video generations, just select the new "Audio" button. Create with a single click or describe with prompts for more customized direction. Audio is available now in beta for free to all users.

0:52

702

quase

quase @quaseobjeto

9 Feb 2025

for anyone wondering, that was me testing a video2audio model in @ComfyUI (sdxl, kling, mmaudio) github.com/hkchengrex/MMAudi…

chesskaria

chesskaria @oran2451

13 Jan 2025

Replying to @MayorKingAI

Great 👍 ai is so advanced now; sora; kling and minimax; i hope if @Hailuo_AI add tools like mmaudio ( video2audio )

172

Ivan Skorokhodov

Ivan Skorokhodov

@isskoro

21 Dec 2024

Diffusion models are very strong and robust feature extractors, but recent works were only using them for recognition tasks. In our recent work (led by @MoayedHaji), we harness them for video2audio generation: they by far outperform conventional video feature extractors for audio/video temporal alignment and allow to achieve SotA results in sound quality as well: snap-research.github.io/AVLi…

0:21

MOAYED HAJi ALi @MoayedHaji

20 Dec 2024

1/6 Introducing AV-Link, an approach to connect video and audio diffusion models in a self-contained framework to enable video-to-audio and audio-to-video generation with superb audio-video synchronization. Project page: snap-research.github.io/AVLi…

1:03

1,376

Christian Marinoni

Christian Marinoni @ChrisMarinoni

20 Dec 2024

Say hello to a new semantically- and temporally-aligned #video2audio synthesis framework! 🎥🎶🎮

Riccardo Fosco Gramaccioni @riccardofosco

20 Dec 2024

🌟 Excited to Share Our Latest Work! 🎥🎶 Here we present Stable-V2A: Synthesis of Synchronized Sound Effects with Temporal and Semantic Controls arxiv: arxiv.org/abs/2412.15023 web: ispamm.github.io/Stable-V2A

1:06

107

Eleonora Grassucci

Eleonora Grassucci @eleonoragrassuc

20 Dec 2024

Super interesting work on #GenAI #Video2Audio with impressive results from my friends @riccardofosco and @ChrisMarinoni with @EmilianPostola1 @marcomunita @cosmo_luca Joshua Reiss and @dhan90001! 👇 Go check it out!

Riccardo Fosco Gramaccioni @riccardofosco

20 Dec 2024

1:06

328

Dr. Senthazal Ravi

Dr. Senthazal Ravi

@senthazalravi

15 Dec 2024

Chinese Giant Network public the "QianYing" voice game generation large model. Including：video model YingGame and video2audio model YingSound Another trillion $$$ industry (Gaming) is Kaput.

青龍聖者

1:24

1,155

青龍聖者

青龍聖者

@bdsqlsz

14 Dec 2024

Chinese Giant Network public the "QianYing" voice game generation large model. Including：video model YingGame and video2audio model YingSound

1:24

146

23,211

Ran.627

Ran.627

@Nin19536

8 Dec 2024

挑战用四个 AI 镜头讲故事 ⬇️ AI 实验短片第一次接触现在抽卡倒是挺简单了，但用了好多工具。以上用到了 MJ 海螺剪映 11labs 等尝鲜测了 elevenlabs 的 video2audio 生成背景音效，生成的环境音还不错，不过需要剪映里补充人声、效果器，叠加四五层递进音轨。

0:19

1,576