▪️ AI x Design @SCTYinc 💙 Building @givecareapp for caregivers

Joined September 2008
1,864 Photos and videos
Pinned Tweet

1
1
8
1,564
Current local LLMs: 1- DeepSeek-V4-Flash IQ2XXS imatrix | GGUF @antirez 2- Gemma-4-26b-a4b-it-4bit | MLX mlx-community 3- Qwen3.6-35B-A3B-UD-MLX-4bit | MLX @UnslothAI
36
Wiki LLM x Nokbox
18
“Rails-before-trains is your deliberate stance, but a rail with no train and no timetable is anticipation, and anticipation is the polite name for overengineering.” Touché
20
yup👇🏼
Jun 10
we did something similar on cloudflare we have these internal apps that use cf primitives like workers, sqlite, r2 and they're all fronted by cloudflare access which requires SSO 100% vibed by opencode
60
Added a voice co-pilot to ExcaliDash self-hosted whiteboard. Speak → gpt-realtime-2 → it calls drawing tools → the canvas updates live, persists, and syncs to everyone on the board. Tools defined once via WebMCP drive both voice and any MCP client.
2
105
Using this copilot in a workshop with a client in a bit. STT-TTS: gpt-realtime-2 TTC (text to canvas): gpt-mini-5.4
97
New iOS app for when you want to read your agents' output. I use it primarily for my (@karpathy) Wiki LLM's on two different machines via Tailscale.
May 28
Helm — read the Markdown and HTML on all your machines from your phone, beautifully rendered, over Tailscale via SSH. No syncing. No copies. No server. Your notes stay where they live; Helm just reads them. github.com/SCTY-Inc/helm-ios
66
Congrats @pvncher!
23 Mar 2025
Happy Lifetime access @RepoPrompt user. My favorite use case is bundling and working across repos for a single project: Next.js 14, React, TypeScript, Python, FastAPI, Tailwind CSS, shadcn/ui, App Router (SSR), Dynamic Imports, Supabase, Stripe, Twilio (SMS, Verify), Azure OpenAI (GPT-4o), mem0, Qdrant (Vector Storage), Verdict (Medical Guardrails), Helicone (API Monitoring), Azure Static Web Apps, GitHub Actions (CI/CD), Cloudflare (DNS, CDN)
47
⏲️ Codex
26
“The loop is the product.” Yes!
OpenMed Agent Claude Opus 4.7 just ran a 14-step special-pathogen ED workup on a synthetic VHF case. Live CDC WHO PubMed retrieval. Evidence-weighted differential. Clinician signature required before any artifacts finalized. The loop is the product.
2
1
3
972
Glad my detour into AI Safety (for caregivers) was a focus of the morning panel in DC today.
Today @amadad joined the @DoleFoundation National Convening to discuss the future of caregiving, AI, and innovation. His message: AI for military and veteran caregivers must protect trust, recognize risk, and make human support easier to reach. #NationalConvening #CaregivingAI
3
44
Loving the updates in Codex App with connections (for VPS). Not connected via mobile (ChatGPT) over Tailscale yet.
44
Finally opened Martin Venezky’s What I Know About Photography (2019)
1
27
Time to move from Convex to Cloudflare?
44
INBOUND → POLICY → CONTEXT → MODEL → JUDGMENT → COMMIT → OUTBOUND
18
Great paper to complement Viv’s tight post arxiv.org/pdf/2604.21003

Strong Opinions, Loosely Held on Agent Harness Engineering: 1. You can outperform any default harness model (including codex & claude code) on pretty much any Task by engineering the harness around it. Using the exact same model, curate prompts, tools, skills, hooks for that Task. This harness optimization process is becoming much more agent driven with humans reviewing and curating evals/rewards to hill climb on. “Just say what you want”. 2. A “general purpose” agent/harness doesn’t really exist, it’s a tradeoff between time spent on customizing the agent and performance (cost, latency, accuracy) on a Task. I don’t exactly follow what a general purpose means tbh. Who decides what’s general and what’s not? 3. But if the “general purpose” agent/harness existed, it would look like a good coding agent 4. Building a Task specific harness will most likely converge to good prompt & tool design (probably packaged up as a Skill) as models become smarter and better at in-context learning 5. Evals are a moat and thus data to produce evals is a moat. Especially true for vertical agent companies. This is because agents can fit to most Eval sets today. If Evals measurably encode all the good behavior your agent needs to do, then this signal can be hill climbed to improve your agent 6. Frontier closed models are far too expensive for the large majority of tasks the world needs to do. As teams start mapping costs to ROI, Open Model Harness Engineering will take off even more. It is almost always worth the investment to at least try to get a potential 20x cost reduction 7. A large chunk of design decisions around Task decomposition and context engineering exist solely because our usable context window is 50-100k. Agents that become excellent at breaking down tasks, applying compaction appropriately, and orchestrating subagents as sub-task workers will be the most delightful products to do real work. 8. We’re entering an Age of Unbundled (& Rebundled) Agents where Subagents exposed as Tools do a ton of domain specific work on behalf of an orchestrator agent. The Harness becomes a box that gets populated with the exact set of tools, skills, and subagents needed to solve that task or sub-task. Examples include WarpGrep (search), Chroma Context-1 (search), Nemotron 3 Omni (small multimodal), etc. Bespoke agents that rock at narrow tasks orchestrated as tools. This also applies to software as tools that are used by agents via Skills like Remotion or Blender. Different harnesses bundle together the tooling needed to complete that narrow task. End of opinions, these may change by the time this tweet goes out or may double down and expand on these in an article
1
5
1,121