Filter
Exclude
Time range
-
Near
🔍 Deep Dive: Why CEMTM Redefines Multimodal Topic Modeling At EMNLP 2025 in Suzhou, I’ll be presenting CEMTM (Contextual Embedding-based Multimodal Topic Modeling) — a model that rethinks how we discover topics in multimodal documents by moving entirely into the contextual embedding space. Unlike classical or contextualized topic models such as CWTM, which rely on Dirichlet priors and discrete sampling, CEMTM operates with continuous variational inference, enabling both semantic precision and computational efficiency. Here’s what makes it stand out: 🚀 Key Contributions 1. Multimodal Topic Learning CEMTM unifies text, image, and structural data under a shared embedding space. Topics are no longer word distributions—they are semantic clusters of contextual embeddings that span across modalities. 2. Contextual Embedding Alignment Each token (word, visual patch, or table element) is attracted to its topic vector in the embedding space, replacing Dirichlet sparsity with differentiable optimization. This enforces semantic cohesion within topics. 3. Cross-Modal Coherence Regularization A novel coherence term maximizes cosine similarity among top tokens of each topic—even across modalities—so that text and visual components that convey the same concept naturally align. 4. Variational Efficiency Without Dirichlet sampling or vocabulary-wide softmax operations, CEMTM achieves up to 3× faster training and 5–10× faster inference, fully leveraging GPU-parallelizable vector operations. 5. State-of-the-Art Topic Quality On multiple multimodal datasets, CEMTM outperforms prior models like CWTM, MMNTM, and ZeroShot-LDA in both coherence and diversity, demonstrating that contextualized multimodal alignment leads to more interpretable and scalable topic discovery. 🧠 The Takeaway CEMTM shows that topic modeling can evolve beyond discrete words and priors. By clustering contextual embeddings directly and optimizing cross-modal coherence, it enables interpretable, efficient, and semantically rich topic discovery across heterogeneous documents. 📍 Presentation: Poster Session — Wednesday, Nov 5 · 16:30–18:00 · Hall C (EMNLP 2025, Suzhou) 📄 Paper: arxiv.org/abs/2509.11465 #EMNLP2025 #MultimodalAI #DeepResearch #TopicModeling #ChartUnderstanding #QuestionAnswering #LLMs #Research
1
5
824
✨ Excited to be presenting three papers at EMNLP 2025 in Suzhou this week! 🇨🇳 I'll be showcasing our recent work on multimodal reasoning, chart understanding, and few-shot data synthesis — exploring how language models can better connect vision, text, and structured information for deeper understanding. 📍 Poster Sessions — Hall C 🧩 CEMTM: Contextual Embedding-based Multimodal Topic Modeling 📅 Wednesday, Nov 5 · 16:30–18:00 > A framework for contextualized topic discovery across multimodal corpora by aligning visual and textual embeddings. 📊 ChartGaze: Enhancing Chart Understanding in LVLMs with Eye-Tracking Guided Attention Refinement 📅 Wednesday, Nov 5 · 16:30–18:00 > We integrate human gaze supervision to improve LVLM interpretability and reasoning over charts. 🔍 FM²DS: Few-Shot Multimodal Multihop Data Synthesis with Knowledge Distillation for Question Answering (Findings) 📅 Friday, Nov 7 · 12:30–13:30 > A pipeline for synthesizing multimodal QA data via cross-model knowledge distillation and multihop reasoning. If you’re attending EMNLP, come by and chat! I’d love to connect and discuss multimodal deep research agents, attention interpretability, and data synthesis for reasoning tasks. #EMNLP2025 #MultimodalAI #DeepResearch #TopicModeling #ChartUnderstanding #QuestionAnswering #LLMs #Research
2
10
759
Thanks to my co-authors for the amazing collaboration! @tahmedge @Ahmed_Masry97 @mtnayeem @JotyShafiq @Enamul_Hoque See you at Vienna, Austria!! 🇱🇻 #LLMs #VisionLanguageModels #AI #ChartUnderstanding #OpenSource #MultimodalAI #LLMasJudge #NLP
1
3
235