This is such an amazing use case of Molmo2 and MolmoWeb! 💕
It showcases again vision is crucial to various applications, and it’s rewarding to see the community builds on top of their strong perceptual capabilities for such real-world applications:
“Works chose MolmoWeb because it's purpose-built for visual pointing on web pages…PointCheck then sends the same region to Molmo with a direct query about what's on the screen…The result is that PointCheck can confirm focus indicators are visually present on screen–not just defined somewhere in a stylesheet.”
I also remember
@rockpang6 mentioned seeing the failure case of MolmoWeb on some “difficult” websites, and upon a closer look it turned out that some websites are not just inaccessible to MolmoWeb bc they’re OOD but they actually appear inaccessible to us human users too (eg having to click on a very tiny part of a UI element) — so it’s a very cool full circle moment to see Works actually built out this accessibility checker with MolmoWeb! 🫡
Brendan Works is a product manager focused on paratransit services in Seattle. See how he built PointCheck, a website accessibility checker powered by our open Molmo, MolmoWeb, & Olmo 3 models. 👇