As an ex-Viv (w/ Siri team) eng, let me help ease everyone's future trauma as well with the Fundamentals of Assisted Intelligence.
Make no mistake, OpenAI is building a new kind of computer, beyond just an LLM for a middleware / frontend. Key parts they'll need to pull it off:
Persistent User Preferences:
- The biggest unlock of assistants has always been to deeply understand what someone wants in the most specific way.
- This is the "wow" moment where computers stop being scary and start feeling truly helpful.
- We did this in 2016 on Viv (
youtu.be/Rblb3sptgpQ) when our AI knew what you liked for each and every service you used via Viv and mixed that in with context like what kind of flowers you told us your mom liked.
- This will need to include access to your personal information to infer preference as well.
External, Real-time Data:
- 50% of the utility of an LLM comes from the base training and RLHF fine-tuning; but much more comes from extending its available data with external sources.
- Zapier, Airbyte and others will help, but expect deep integration with 3rd party apps / data pipelines.
- "Chat w/ PDF" is a tiny, tiny part of this. If you're only building that, think much bigger.
Actual Computing on a Virtual Machines:
- Context windows are limiting, so AI providers will continue benefiting from running tasks directly on a Python or Node/Deno virtual env so it can consume huge amounts of data just like a computer today can.
- Today these are short-lived envs used by Data Analyst / Julius, but over time they'll become a new type of Dropbox where your data is persisted long term for additional processing or cross-file inference / insights.
Agent Task / Flow Planning:
- Planning can't function without intent. Understanding intent has always been a holy grail, and LLMs finally helped us unlock what we spent years approximating at Viv with NLP tricks.
- Once intent is accurate, planning can start. Creating an agent planner is incredibly nuanced and will take significant integration with user preferences, 3rd party data sets, knowledge of compute capabilities, etc.
- The bulk of the real magic of Viv was the dynamic planner / mixer that would pull all these data and APIs together and generate both a workflow AND dynamic UI on top of them for a normal consumer to execute.
An App Store of Experts:
- Apple initially made the mistake of building a closed app store; then realized they could monetize a cornucopia of creativity if they opened it.
- Regardless of OpenAI saying they're focused on ChatGPT and only ChatGPT, it's inevitable they'll rescope it and enable a long tail of specialized assistants.
- Builders will be able to compose multiple tools together into workflows that can specialize
- And AIs over time will be able to auto-compose these tools together as well, learning from the builders that came before them.
Persistent, Contextual Memory:
- Embeddings are helpful, but they are missing fundamental parts like context switching, conversational centroids, summarization, enrichment, etc.
- Most of the cost of LLMs today comes from prompts, but as history and persistence is embedded and the inference cached, this will unlock the ability to have long term memory with pointers to critical subjects, topics, feelings, tone, etc.
- Core memory is just the beginning. We still need all the rich information our minds conjure when we think about a past sunset, a breakup, a scientific understanding, or sensitive context for people we interact with.
Long Polling Tasks:
- "Agent" is a loaded word, but part of the intent is to have tasks that can be scheduled and self-completing regardless of the time horizon required.
- E.g. "Let me know when flights from Montréal to Hawaii are less than $500"
- This will require coordination of compute across API providers, as well as virtual envs in the cloud.
Dynamic UI:
- Chat is not the final, end-all interface. There's a reason apps have affordances like buttons, date pickers, images. It simplifies, clarifies.
- AI will be a copilot, but to be a copilot it'll need to adjust to what works best for a given user. The future is personalized as optimizations require it, so UI will be dynamic.
API & Tool Composition:
- Expect AIs to generate custom "apps" in the future where we can build our own workflows and compose together APIs, without waiting for a big startup to do so.
- Fewer apps and startups will be needed to generate frontends, and AI will be better at composing an array of tools and APIs together coupled with a gas fee / tax.
Assistant-to-Assistant Interaction:
- There will be countless assistants in the future, with each assisting humans and other assistants towards some greater intent.
- Alongside this, assistants will need to learn to interface across text, APIs, file systems, and other modalities used both by agents / startups and humans as integration flows deeper into our world.
Plugin / Tool Stores:
- Specialized assistants can only be made possible by composing tools, APIs, prompts, data, preferences, and much more.
- The current plugin store is super early days, so expect much more work to come, and expect many of those plugins to be rolled in-house as they become more mission critical.
And this is just a 10 minute brain dump; much, much more is needed behind the scenes including internet search and scraping, community (for intent, building, RLHF, etc), dynamic API generators and connectors, gas fees, tool builders, ingestion via glasses / earbuds / etc. If you think it's too late to be in AI, just know the above is about 25% of what it'll actually take, with much more to come as we iterate and get even more creative.
We're in the early days of building parts of this at
@FastlaneAI but with a different understanding: OpenAI will never be the best at everything. So we want to let you use the best AIs in the world, regardless of who builds them (that could be you!). Come join the fun!