generated multi-modal embeddings of my bookmarks using @GoogleDeepMind’s new Gemini Embedding 2, allowing low-dim spatialisation using UMAP and high-dim similarity querying via different media types.
A nano-banana canvas experiment. Generating based on a source image and the overlapping prompts, caching the results. Like adding semantic layers to an image.
spoke with @tylerangert about his ideas here a couple of months back — can’t wait to see them come to life! very excited to play with true creative tools for software development.
love this work so much! feels like the next step on from apple’s fantastic widget & complication work — data candy, sticker apps, chewable compute. stick them to your carplay dashboard, leave them scattered around your visionOS space like marbles.
hey! we've built a new AI playground over at @cloudflare to demonstrate what can be achieved by chaining multi-modal models together. think — audio → text → image → text, or composing multiple LLMs together. here's a quick demo video!
looks like visionOS 2 has enterprise APIs for specifying object tracking detection rate and for increasing performance headroom, along with a new API for requesting hand anchor data at a given timestamp (developer.apple.com/document…). hopefully this leads to >30hz hand tracking data!