Filter
Exclude
Time range
-
Near
Topics alone donโ€™t reveal the full story. Topic network analysis shows how themes connect across documents. Learn how to build a BERTopic based topic network and identify hub topics in NetMiner. ๐Ÿ‘‰ netminer.medium.com/topic-neโ€ฆ #TopicModeling #TextMining #NetworkAnalysis
1
1
505
๐Ÿ” Deep Dive: Why CEMTM Redefines Multimodal Topic Modeling At EMNLP 2025 in Suzhou, Iโ€™ll be presenting CEMTM (Contextual Embedding-based Multimodal Topic Modeling) โ€” a model that rethinks how we discover topics in multimodal documents by moving entirely into the contextual embedding space. Unlike classical or contextualized topic models such as CWTM, which rely on Dirichlet priors and discrete sampling, CEMTM operates with continuous variational inference, enabling both semantic precision and computational efficiency. Hereโ€™s what makes it stand out: ๐Ÿš€ Key Contributions 1. Multimodal Topic Learning CEMTM unifies text, image, and structural data under a shared embedding space. Topics are no longer word distributionsโ€”they are semantic clusters of contextual embeddings that span across modalities. 2. Contextual Embedding Alignment Each token (word, visual patch, or table element) is attracted to its topic vector in the embedding space, replacing Dirichlet sparsity with differentiable optimization. This enforces semantic cohesion within topics. 3. Cross-Modal Coherence Regularization A novel coherence term maximizes cosine similarity among top tokens of each topicโ€”even across modalitiesโ€”so that text and visual components that convey the same concept naturally align. 4. Variational Efficiency Without Dirichlet sampling or vocabulary-wide softmax operations, CEMTM achieves up to 3ร— faster training and 5โ€“10ร— faster inference, fully leveraging GPU-parallelizable vector operations. 5. State-of-the-Art Topic Quality On multiple multimodal datasets, CEMTM outperforms prior models like CWTM, MMNTM, and ZeroShot-LDA in both coherence and diversity, demonstrating that contextualized multimodal alignment leads to more interpretable and scalable topic discovery. ๐Ÿง  The Takeaway CEMTM shows that topic modeling can evolve beyond discrete words and priors. By clustering contextual embeddings directly and optimizing cross-modal coherence, it enables interpretable, efficient, and semantically rich topic discovery across heterogeneous documents. ๐Ÿ“ Presentation: Poster Session โ€” Wednesday, Nov 5 ยท 16:30โ€“18:00 ยท Hall C (EMNLP 2025, Suzhou) ๐Ÿ“„ Paper: arxiv.org/abs/2509.11465 #EMNLP2025 #MultimodalAI #DeepResearch #TopicModeling #ChartUnderstanding #QuestionAnswering #LLMs #Research
1
5
824
โœจ Excited to be presenting three papers at EMNLP 2025 in Suzhou this week! ๐Ÿ‡จ๐Ÿ‡ณ I'll be showcasing our recent work on multimodal reasoning, chart understanding, and few-shot data synthesis โ€” exploring how language models can better connect vision, text, and structured information for deeper understanding. ๐Ÿ“ Poster Sessions โ€” Hall C ๐Ÿงฉ CEMTM: Contextual Embedding-based Multimodal Topic Modeling ๐Ÿ“… Wednesday, Nov 5 ยท 16:30โ€“18:00 > A framework for contextualized topic discovery across multimodal corpora by aligning visual and textual embeddings. ๐Ÿ“Š ChartGaze: Enhancing Chart Understanding in LVLMs with Eye-Tracking Guided Attention Refinement ๐Ÿ“… Wednesday, Nov 5 ยท 16:30โ€“18:00 > We integrate human gaze supervision to improve LVLM interpretability and reasoning over charts. ๐Ÿ” FMยฒDS: Few-Shot Multimodal Multihop Data Synthesis with Knowledge Distillation for Question Answering (Findings) ๐Ÿ“… Friday, Nov 7 ยท 12:30โ€“13:30 > A pipeline for synthesizing multimodal QA data via cross-model knowledge distillation and multihop reasoning. If youโ€™re attending EMNLP, come by and chat! Iโ€™d love to connect and discuss multimodal deep research agents, attention interpretability, and data synthesis for reasoning tasks. #EMNLP2025 #MultimodalAI #DeepResearch #TopicModeling #ChartUnderstanding #QuestionAnswering #LLMs #Research
2
10
759
Have you ever tried to make sense of thousands of documents โ€” long documents with multiple figures in them โ€” and wished for a way to automatically uncover their main themes? ๐Ÿ“š๐Ÿ–ผ๏ธ Thatโ€™s where topic modeling comes in. Weโ€™re excited to introduce CEMTM, a new state-of-the-art multimodal topic model and the first to handle long multimodal documents at scale! ๐Ÿš€ CEMTM is a framework for interpretable multimodal topic modeling. Unlike prior models, it: ๐Ÿ”น Leverages fine-tuned large visionโ€“language models (LVLMs) to unify text image information into contextual embeddings. ๐Ÿ”น Introduces a distributional importance network that learns which words and image regions truly matter for topic inference. ๐Ÿ”น Aligns topics with document-level semantics through a reconstruction objective, ensuring coherence across modalities. ๐Ÿ”น Produces explicit wordโ€“topic and documentโ€“topic distributions, preserving interpretability while scaling to long, multimodal documents. Across six benchmark datasets, CEMTM sets new state-of-the-art results in topic quality and diversity, while also proving useful for downstream tasks like few-shot retrieval and multimodal QA. In short, it shows how multimodal grounding structured topic modeling can enable better corpus exploration, retrieval, and reasoning. ๐Ÿ“„ Paper: arxiv.org/abs/2509.11465 ๐Ÿ’ป Code: github.com/AmirAbaskohi/CEMTโ€ฆ A huge thanks my supervisors: @careninigiusepp and @JotyShafiq from @SFResearch and all of my collaborators: @liraymond96 and @ChuyuanLi Looking forward to sharing this work at EMNLP 2025 in China ๐Ÿ‡จ๐Ÿ‡ณ โ€” hope to see you there! #EMNLP2025 #NLP #MultimodalAI #TopicModeling #AIResearch #NLP #LLM #LargeLanguageModels

2
6
267
Ever tried answering a complex question that requires digging through multiple research papers, combining text, tables, and even figures to find the answer? Thatโ€™s the essence of multimodal multihop question answering (MMQA), and itโ€™s critical for real-world tasks like interpreting medical records, analyzing educational documents, or conducting deep research across long multimodal content. Today, most people turn to large APIs for this, but thatโ€™s costly and often impractical. A promising alternative is to build smaller expert models fine-tuned on the right data, models that can perform complex multimodal reasoning without requiring massive compute. Excited to share that our paper FM2DS: Few-Shot Multimodal Multihop Data Synthesis with Knowledge Distillation for Question Answering has been accepted to EMNLP 2025 Findings! ๐ŸŽ‰ FM2DS introduces the first scalable framework for synthesizing high-quality MMQA datasets from long multimodal documents. Our five-stage pipeline automatically generates and validates realistic QA pairs, enabling smaller models to match, and even surpass, those trained on expensive human-labeled data. We also release M2QA-Bench, the first benchmark for MMQA on long documents, to push research forward in this space. ๐Ÿ“„ Paper: arxiv.org/abs/2412.07030 ๐Ÿ’ป Code: github.com/ServiceNow/FM2DS ๐Ÿ“Š Benchmark: huggingface.co/datasets/Amirโ€ฆ Big thanks to @ServiceNowRSRCH and my mentors @gspandana , @careninigiusepp, and @ILaradji for this collaboration. Looking forward to seeing everyone in China ๐Ÿ‡จ๐Ÿ‡ณ for EMNLP 2025! #EMNLP2025 #NLP #MultimodalAI #TopicModeling #AIResearch #NLP #LLM #LargeLanguageModels

1
5
14
1,184
@emnlpmeeting / #EMNLP2025 Accepted Paper: CEMTM: Contextual Embedding-based Multimodal Topic Modeling ๐Ÿ“ Paper: bit.ly/3JsZFFy This work introduces CEMTM, a context-enhanced multimodal topic model that leverages fine-tuned large vision-language models to infer coherent topic structures from documents containing both text and images. The approach uses distributional attention mechanisms to weight token-level contributions and aligns topic representations through reconstruction objectives. Key contributions: โžก๏ธ Holistic multimodal document encoding using pretrained LVLM embeddings without separate modality encoders โžก๏ธ Distributional attention mechanism for learning token importance and improving semantic alignment โžก๏ธ Reconstruction-based training objective that preserves cross-modal semantics in topic structures โžก๏ธ Strong performance across six benchmarks with average LLM coherence score of 2.61 Results demonstrate significant improvements over unimodal and multimodal baselines, with effectiveness shown in downstream few-shot retrieval tasks and ability to capture visually grounded semantics. ๐Ÿ‘ฅ Authors: Amirhossein Abaskohi @AmirAbaskohi, Raymond Li, Chuyuan Li @ChuyuanLi, Shafiq Joty @JotyShafiq, and Giuseppe Carenini @careninigiusepp #FutureOfAI #EnterpriseAI #NLP #MachineLearning #MultimodalAI #TopicModeling
2
4
474
New Research Published: Impact of COVID-19 on Primary Health Care Research Trends and Suggestions for Better Services Approaches Viaโ€ฆ blockchainhealthcaretoday.coโ€ฆ #blockchain #blockchaininhealthcare #Covidresearch #blockchaintech #COVID19, #coronavirus, #primarycare #primaryhealthcare, #topicmodeling
1
2
135
How do #bioethics and #PhilosophyOfMedicine relate? Enter Vilius Dranseika with cool #webScraping, #topicModeling, and #dataViz! #PhilMed was more than a branch of #PhilSci, involving #epistemology, #metaethics, and more. It wasn't clear whether Bioethics is part of PhilMed.
1
1
2
297
When was the last time you saw someone teach #DigitalHumanities with a chalkboard? ๐Ÿง‘โ€๐Ÿซ @cnDuKeli of DH Trier explains the machinery behind #TopicModeling during the 2nd day of the #DHSpringSchool
1
7
283
25 Feb 2025
Machine Learning Interview Question 34: ๐–๐ก๐š๐ญ ๐ข๐ฌ ๐ญ๐จ๐ฉ๐ข๐œ ๐ฆ๐จ๐๐ž๐ฅ๐ข๐ง๐ ? ๐ƒ๐ข๐ฌ๐œ๐ฎ๐ฌ๐ฌ ๐ข๐ญ๐ฌ ๐ฐ๐จ๐ซ๐ค๐ข๐ง๐ , ๐š๐ฉ๐ฉ๐ฅ๐ข๐œ๐š๐ญ๐ข๐จ๐ง๐ฌ, ๐š๐ง๐ ๐ญ๐ก๐ž ๐ฉ๐ซ๐จ๐ฌ ๐š๐ง๐ ๐œ๐จ๐ง๐ฌ Answer Link: aiml.com/what-is-topic-modelโ€ฆ Topic modeling has emerged as a highly useful technique in Natural Language Processing (NLP) for deriving meaningful insights from unstructured textual data. Example of such data includes articles, blog posts, customer reviews, emails, and social media posts. ๐Ÿ‘‰ Learn how Topic Modeling works, where it's used, and its advantages and challenges in this article. The article is organized into following topics โ—พ About Topic Modeling โ—พ Algorithms used for Topic Modeling โ—พ How Topic Modeling works? โ—พ Real world applications of Topic Modeling โ—พ Advantages and disadvantages of using Topic Modeling -- ๐Ÿš€ If you're preparing for Machine Learning interviews, go to AIML.com for top resources and insights ๐Ÿ”— Link to Top 100 ML Interview Questions: aiml.com/top-100-machine-leaโ€ฆ ๐ŸŒ ๐‘จ๐‘ฐ๐‘ด๐‘ณ.๐’„๐’๐’Ž ๐’Š๐’” ๐’•๐’‰๐’† ๐’˜๐’๐’“๐’๐’…'๐’” ๐’๐’‚๐’“๐’ˆ๐’†๐’”๐’• ๐’“๐’†๐’‘๐’๐’”๐’Š๐’•๐’๐’“๐’š ๐’๐’‡ ๐‘ด๐’‚๐’„๐’‰๐’Š๐’๐’† ๐‘ณ๐’†๐’‚๐’“๐’๐’Š๐’๐’ˆ ๐’Š๐’๐’•๐’†๐’“๐’—๐’Š๐’†๐’˜ ๐’’๐’–๐’†๐’”๐’•๐’Š๐’๐’๐’” ๐’‚๐’๐’… ๐‘ธ๐’–๐’Š๐’›๐’›๐’†๐’”. (๐‘จ๐’๐’ ๐‘ญ๐‘น๐‘ฌ๐‘ฌ) #aiml_com #machinelearning #topicmodeling #machinelearninginterview
1
6
548
6 Jan 2025
I am looking for ways to identify emerging or implicit topics in feedback data, particularly sensitive issues e.g bullying or harassment. I am interested in methods that donโ€™t rely on predefined labels or keywords. Has anyone tackled something similar? #NLP #AI #TopicModeling
1
3
2
443
Handling large volumes of user feedback can feel overwhelming. Surveys, reviews, support tickets, social media commentsโ€”where do you even start? ๐Ÿค” I wrote an article to help teams navigate this challenge with a structured approach. ๐Ÿงต#TopicModeling #NMF #MachineLearning
1
2
4
546
Happy to announce that our #Rstats package โœจtopiclabelsโœจ has been updated on #CRAN ๐ŸŽ‰ ๐Ÿค–Using open #LLMs, our package automatically assigns a topic label to a bag of words. ๐ŸคIt works with all popular #TopicModeling packages! Find out more: ๐Ÿ‘‰github.com/PetersFritz/topicโ€ฆ
1
2
6
155
7 Oct 2024
Machine Learning Interview Question 34: ๐–๐ก๐š๐ญ ๐ข๐ฌ ๐ญ๐จ๐ฉ๐ข๐œ ๐ฆ๐จ๐๐ž๐ฅ๐ข๐ง๐ ? ๐ƒ๐ข๐ฌ๐œ๐ฎ๐ฌ๐ฌ ๐ข๐ญ๐ฌ ๐ฐ๐จ๐ซ๐ค๐ข๐ง๐ , ๐š๐ฉ๐ฉ๐ฅ๐ข๐œ๐š๐ญ๐ข๐จ๐ง๐ฌ, ๐š๐ง๐ ๐ญ๐ก๐ž ๐ฉ๐ซ๐จ๐ฌ ๐š๐ง๐ ๐œ๐จ๐ง๐ฌ Answer Link: aiml.com/what-is-topic-modelโ€ฆ Topic modeling has emerged as a highly useful technique in Natural Language Processing (NLP) for deriving meaningful insights from unstructured textual data. Example of such data includes articles, blog posts, customer reviews, emails, and social media posts. ๐ŸŒ ๐Ÿ‘‰ Learn how Topic Modeling works, where it's used, and its advantages and challenges in this article. The article is organized into following topics โ—พ About Topic Modeling โ—พ Algorithms used for Topic Modeling โ—พ How Topic Modeling works? โ—พ Real world applications of Topic Modeling โ—พ Advantages and disadvantages of using Topic Modeling -- ๐Ÿš€ If you're preparing for Machine Learning interviews, head to AIML.com for top resources and insights ๐Ÿ”— Link to Top 100 ML Interview Questions: aiml.com/top-100-machine-leaโ€ฆ ๐ŸŒ ๐‘จ๐‘ฐ๐‘ด๐‘ณ.๐’„๐’๐’Ž ๐’Š๐’” ๐’•๐’‰๐’† ๐’˜๐’๐’“๐’๐’…'๐’” ๐’๐’‚๐’“๐’ˆ๐’†๐’”๐’• ๐’“๐’†๐’‘๐’๐’”๐’Š๐’•๐’๐’“๐’š ๐’๐’‡ ๐‘ด๐’‚๐’„๐’‰๐’Š๐’๐’† ๐‘ณ๐’†๐’‚๐’“๐’๐’Š๐’๐’ˆ ๐’Š๐’๐’•๐’†๐’“๐’—๐’Š๐’†๐’˜ ๐’’๐’–๐’†๐’”๐’•๐’Š๐’๐’๐’” ๐’‚๐’๐’… ๐‘ธ๐’–๐’Š๐’›๐’›๐’†๐’”. (๐‘จ๐’๐’ ๐‘ญ๐‘น๐‘ฌ๐‘ฌ) #aiml_com #machinelearning #topicmodeling #machinelearninginterview
1
3
359
#LSPPDay49 Continued my exploration of Topic Modeling today. I delved into its types: LSA (Latent Semantic Analysis) and LDA (Latent Dirichlet Allocation), and learned about their differences. #NLP #TopicModeling #60DaysOfLearning2024 #LearningWithLeapfrog @lftechnology
3
63
We are at @ICSSIConference! Ross Potter and Ann Beynon from @Clarivate spoke on using #TopicModeling to investigate the impact of academic research, while Anand Desai spoke on the impact of AI in #R&D evaluation with Frances Carter-Johnson from @NSF. icssi.org
7
508
Attending 2 conferences this week: presenting a poster at #DHNB2024 in person in Iceland (where a volcano just erupted) and a talk at the #xPhi2024 virtually, both on using #LLMs in #DH humanities, zero-shot text classification & Lenin detection & why we can forget topicmodeling
1
4
13
1,267