I think most underestimate frontier vision-capable LLMs. They will change how we extract and retrieve information over multiple modalities and have a broad industry impact. Simon is great at highlighting these capabilities (and limitations).
For example, when you upload a PDF to Gemini AI Studio, it extracts screenshots of all the pages and stuff into the prompt; converting everything to screenshots is so intelligent; it's how we look at information through our eyes.
Scraping data by taking a screenshot of a dashboard and then running it through a vision model is both slightly absurd and potentially quite a robust way of accessing data that's otherwise impossible to extract!