Filter
Exclude
Time range
-
Near
From Macro to Micro: Benchmarking Microscopic Spatial Intelligence on Molecules via Vision-Language Models 1. A new benchmark framework called MiSI-Bench is introduced to evaluate the ability of Vision-Language Models (VLMs) to understand and reason about the spatial relationships of microscopic entities like molecules. This is crucial for scientific discovery in fields such as structural biology and drug design. 2. MiSI-Bench consists of over 163,000 question-answer pairs and 587,000 images derived from around 4,000 molecular structures. It includes nine tasks ranging from basic spatial transformations to complex relational identifications, providing a comprehensive assessment of microscopic spatial intelligence. 3. The study reveals that current state-of-the-art VLMs perform significantly below human level on this benchmark. However, a fine-tuned 7B model shows substantial potential, even surpassing humans in some spatial transformation tasks, indicating the untapped potential of VLMs for microscopic spatial reasoning. 4. The research highlights the necessity of integrating explicit domain knowledge into VLMs to improve their performance in scientifically-grounded tasks such as hydrogen bond recognition. This suggests that combining domain expertise with VLMs is essential for progress toward scientific AGI. 5. The datasets are available at huggingface.co/datasets/zong…, providing a valuable resource for researchers to further explore and enhance the microscopic spatial intelligence of VLMs. 📜Paper: arxiv.org/abs/2512.10867v1 #MicroscopicSpatialIntelligence #VisionLanguageModels #Benchmarking #MolecularStructures #ScientificDiscovery
1
2
17
1,383