Joined October 2016
884 Photos and videos
Pinned Tweet
27 Nov 2025
If @uxwizz was so loved by customers 5-10 years ago, imagine how good it is today after years of extra work :)
1
1
4
779
Just added multi-query search for AiBenchy, so you can easily filter the leaderboard and search for multiple models at once:
9
I am not sure GLM 5.2 is that much better compared to GLM 5. GLM 5 was also doing really well. Interesting that the GLM 5.2 (none) version seems to be worse than GLM 5 (none)
1
59
I am comparing it with GLM 5 instead of GLM 5.1, because the two models were performing similarly, with GLM 5 actually doing better in most cases.
35
My repo is getting a bit too big, maybe I should stop saving tested model results in JSON files... The advantages of JSON files: - they are automatically versioned in .git - easily searchable both in the editor and by LLMs - can immediately see what changed in git diff - human readable, easy to understand and portable (can be transferred and used in other apps with a simple copy-paste) Disadvantages: - not efficient to run queries on (e.g. find highest value for key XYZ across files; but in my case I do a local pre-processing of all the values I need, in a summary json file) - slows down git and git diff a lot once you have large files or many files - llms can sometimes over-read files and have the context filled with too much unrelated data A solution could be moving to a database, SQLite would be the closest choice to current setup, the only difference is that: - versioning would be lost (unless intentionally saving old version of files) - LLMs would need a database query tool, so they can find data - LLMs would need a way to find what changed since last snapshot For now, I will stick with JSON files (I changed the structure to make sure JSON files don't get too big), mostly because of the transparency they provide in seeing exactly what changed (while the app is still being development, it's nice to see if any code execution has unintended effects or not).
56
LLMs be like (GLM 5.2)
22
Jun 16
Added model cost filtering. Useful to find the best model for a specific price range.
11
Jun 11
Made two extra sales those days, with the exact same user journey: - started trial around end of May - trial ends start of June - converted immediately on trial end I was actually considering to remove the trial lately, but now I see that almost 100% of the last 5-10 customers started with the trial. I think it makes a lot of sense for a self-hosted software, where the fear of how complex installation might be is one of the main pushbacks compared to a cloud alternative.
Jun 10
Even thought the UI is far from perfect, I haven't found any tool as good as @uxwizz to see the entire user journey in one screen and instantly understand what happened. Someone purchased yesterday, and I was curious if my recent website changes (localization) had any impact. I simply used Codex to automatically translate my website in 8 languages (chose languages based on which countries had most conversions so far). The translations are far from perfect, but it does seem to reduce checkout friction, made 2 sales since the change, with zero extra marketing.
18
Jun 10
Claude Fable 5 made the best hamster so far. Prompt: Create a detailed SVG illustration of a hamster playing table tennis.
2
1
65
Jun 10
Opus 4.8:
1
16
Jun 10
Opus 4.7:
16
Jun 10
I tried benchmarking this model on @OpenRouter but requests keep timing out, couldn't even complete the first test...
17
Jun 10
Even thought the UI is far from perfect, I haven't found any tool as good as @uxwizz to see the entire user journey in one screen and instantly understand what happened. Someone purchased yesterday, and I was curious if my recent website changes (localization) had any impact. I simply used Codex to automatically translate my website in 8 languages (chose languages based on which countries had most conversions so far). The translations are far from perfect, but it does seem to reduce checkout friction, made 2 sales since the change, with zero extra marketing.
2
47
Jun 10
Another change that I made, which seemed to impact sales is to change the plan names from "Personal/Company/Agency" to "Solo/Growth & scale/Agency" People where confused before, asking if they could use "Personal" for their business.
14
My fav one is from Gemini 3.5 Flash, which added a ladder for the hamster to get on the table:
1
46
Grok be like:
23
Some of them are... special
Added a new AI Benchy feature! I call it "Showcases". Instead of just displaying numbers, I ask models to create various visual artifacts (SVG images, UI elements, small interactive demos, etc.). The first showcase: "SVG of a hamster playing table-tenis". I am currently running it for all models. This is Qwen 3.7 Plus:
1
37