AI agents, dev tools, onchain infra

Joined October 2020
205 Photos and videos
Tiny Eval It tests LLMs with arbitrary github repos PRs with a simple agent having access to simple tools like read, write, bash etc. To rank LLMs based on their capability to make change using PR info and comparing against human made changes. The agent it uses (TinyAgent) is initial implementation of Mario's Pi agent. You can also run eval using a simple CLI ``` bun run tval run-eval --repo lodash/lodash --eval-llms deepseek/deepseek-v4-pro,deepseek/deepseek-v4-flash --pr-summary-llm deepseek/deepseek-v4-flash --eval-judge-llm deepseek/deepseek-v4-pro --limit 3 --concurrency 8 --retries 3 ```
1
3
35
Working on Tiny Eval an evaluation engine to evaluate LLMs' behavior based on testing them on old github repo PRs more results soon
32
revived my 3 year old project Tiny Lang - a programming language that I wrote for fun in rust that now compiles in WASM and can be run your browser. The language has syntax similar to Rust and supports LLVM's library functions using the following syntax. extern fun <function_name>(); Recently using codex, I just asked it to compile to WASM and write a simple playground. You can run tiny lang programs in your browser. tiny-lang.vivek.ink/
1
2
109
10 agents playing The Mafia among themselves. Vibe coded lil Mafia simulation that let's multiple agents play together with different styles. And a dashboard that visualises the progress in the game so far. this uses the openai agents SDK The game is engine and agents just participate by sending specific instructions to change state and check the public message dashboard to make opinions.
1
2
110
“Use the unused right 'Cmd' key for the thing you do all day: switching apps.” CmdTab: Hold Right ⌘, press a number, jump straight to the app you want. No Cmd Tab no cycling, no searching, just switching instantly. It's an opinionated workflow that I thought of because existing apps in this category needs you to have two interaction, one to open up switcher, another one to actually switch. With CmdTab, you use only one combination to switch between apps, making it super fast. Custom shortcuts let's u assign characters to apps, so you can do something like: right cmd b -> for switching to browser. right cmd o -> for switching to obsidian. or numbers. Free and open source forever: cmd-tab.vercel.app/
1
1
5
267
Added Instant switching workspaces without animation, an optional yet v useful feature.
1
92
I did deep research on Nikita Bier's tweets on growing on X and created a article using x research SKILL x.com/0xStateMachine/status/… Here is the article
I built x-research to do deep research on Twitter/X. You can do deep research on topics, accounts, your feed, bookmarks etc.
2
4
308
even uncodexify really cant save much openai models with bad ui generation.
1
1
147
My current setup Codex -> Everything excluding UI Gemini -> UI Kimi, Deepseek -> Experiments with Openclaw and hermes.
145
I built x-research to do deep research on Twitter/X. You can do deep research on topics, accounts, your feed, bookmarks etc.
1
1
5
471
It doesn't use twitter's official API but a CLI that uses browser logged in session. Try it now. xresearch.vivek.ink/

2
115
State Machine retweeted
vibed Antislop: a tiny dopamine gym for your brain. (Only iOS) Instead of scrolling through social slop and frying your brain, play Antislop: fast games and puzzles that give your dopamine a workout -endless math puzzles that keep getting harder -old-school QuizUp-style trivia across multiple categories -took less time to build than Apple took to review it Average tokenmaxxing on 5.5 and 4.7. Would love any feedback, roasts, or feature ideas
1
15
17
872
this is sick idea
Suggested by @theo in live, and inspired by Folding@Home I'll present you PromptRelay The Idea is, for OSS to be able to use claude -p or codex exec on the volunteer machine, by giving it the context of an issue, and having the CLI file a PR We all have tons of limits we don't use, giving to the OSS community is the best thing we can do More info on how to here: promptrelay.dev/how-it-works It comes with a TUI, where you can change the different providers, how many task per day are allowed on your machine, if you want to auto-approve and if you only want to volunteer for some specific project. Very early stage, but would love to get feedbacks! <3
1
4
162