We’re interested in AI systems that can collaborate in real time, without relying only on artificial turn boundaries.
For audio, this feels natural: listen, speak, interrupt, update.
For video, we think an important version of this is visual proactivity — models that respond when something happens visually:
“Tell me when I start slouching.”
“Count my pushups.”
“Say stop when the person stops doing X.”
Tessa's quality of life has improved a lot with some nagging.