Joined March 2009
337 Photos and videos
I’ve been training a model called Sinai. The reason I started is simple: I want to tackle hallucinations at the model behavior level. Not by making the model bigger and hoping it becomes honest. Not by putting retrieval around a chatbot and pretending the problem is solved. Sinai is being trained to recognize when evidence is actually enough to answer, and when the correct move is to refuse. I just finished the first Sinai-EI eval run on the current model. Early results: 100% abstention recall on insufficient evidence cases. 80 to 90% direct lookup accuracy. Strong evidence selection in covered domains. Multi-hop synthesis and conflict detection are starting to show up. Right now I’m verifying claim-level support before release, so unsupported claims can be caught before they reach the user. That is the part I care about most. I don’t want another model that sounds confident while making things up. I want Sinai to know where the evidence ends. A fluent wrong answer is worse than a correct refusal. Stay tuned :D
87
You can't say shit like this and not expect it to crash
Toronto's towering temporary FIFA bleachers perfectly safe, builder says, especially on game day nationalpost.com/news/canada…
4
Yousef | Developer | e/acc retweeted
Jun 13
Anthropic
118
1,335
14,658
713,506
Yousef | Developer | e/acc retweeted
none of this happens if they called it opus 5 and didn’t engage in the day 0 propaganda
The US government, citing national security authorities, has issued an export control directive to suspend all access to Fable 5 and Mythos 5 by any foreign national, whether inside or outside the United States, including foreign national Anthropic employees. The net effect of this order is that we must abruptly disable Fable 5 and Mythos 5 for all our customers to ensure compliance. Access to all other Claude models is not affected. We apologize for this disruption to our customers. We believe this is a misunderstanding and are working to restore access as soon as possible. Read our full statement: anthropic.com/news/fable-myt…
111
252
9,378
269,938
Honestly after trying a couple of dyson products, dyson should make a PC case.
Jun 11
🚨 Dyson, şimdi de saçtan düşmeyen toka üretip test etti
11
Bruh
‼️ UPDATE: It just doesn't stop: Almost 900 Arch Linux packages infected now. lists.archlinux.org/archives…
37
Yousef | Developer | e/acc retweeted
😭😭😭
84
379
7,855
196,250
My timelines for some reason
7
Yousef | Developer | e/acc retweeted
unbelievable timeline we live in
NEW: Tom Brady launches “Good Nut” — an organic coconut water drink.
86
2,477
36,521
3,014,583
Yousef | Developer | e/acc retweeted
this week has been awesome, can you imagine how fun it'd be if all of these were bundled together in some sort of electronic entertainment expo lol thatd be cute i think
What a week for video games!
204
9,440
112,805
2,436,883
That is Scrappy
First look at Scooby Doo in the live-action ‘SCOOBY DOO’ series. Coming soon to Netflix.
10
Does @x know that it's currently impossible to create new accounts?
1
61
Had to confirm because I was very confused #ff7 #ff7r
45
Yousef | Developer | e/acc retweeted
Well done @RoyalMail - took a mere 19 years to deliver this magazine. Inconvenience? Well the kids have now left home ….
217
2,260
59,645
1,668,023
Yousef | Developer | e/acc retweeted
GRANDMA IS FREE !!!!!!!
20 Apr 2024
not letting my grandma out until code veronica get a remake
358
19,776
191,756
4,249,086
YES FINAL FANTASY VII ON PC AT LAUNCH FINALLY 😭😭😭 #SummerGameFest
109
Game journalists are sweating rn hearing about cuphead #SummerGamefest
45
YES CODE VERONICA THE BEST RESIDENT EVIL GAME #SummerGamefest
91
Yousef | Developer | e/acc retweeted
Jun 3
Years ago on reddit some lady was asking in a thread what a strange 32 byte packet was on her router. I said 32 bytes is like, a couple zeroes and ones, it was probably a ping. Well she didn't like that answer one bit. She said she was in contact with law enforcement and they were doing nothing. She said her neighbors take shifts watching her, everything she does and she was worried for her daughter. I said listen lady, I'm sorry to break it to you but it sounds like you have schizophrenia. You're having paranoid delusions and you need help. Forgot about it. Cut to years later. I pick up a phone call at my business. Hello is this Bone from reddit? I'm calling you to let you know I've called the FBI. They know what your doing. She begged me to leave her alone. Once I replied to her paranoid delusion, I became part of it. She thought I was some ringleader, that I had been singlehandedly destroying her life stalking her. At this point I was in a panic. This is my business. How the hell did you find this number? Lady, I don't want to be mean to you in an acute mental crisis but you can't be calling me. Seek help. I hang up. Years later on Twitter. "Hello Bone? Is this the Bone whose been stalking me for years?" Blocked. The phone rings again. It's her. She's accusing me of all sorts of things, that I've been stealing her checks and working with the local judges to destroy her family. I said BITCH if you don't leave me alone I will deploy my full fucking satellite army on you. I will microwave you through your fucking walls. I will get your neighbors, the ones I've been paying all these years, to follow you everywhere. I'll hack every damn wifi in your house. I can see through your walls. We can hear your thoughts. I read every piece of mail you ever got, me and my friends, the government. We've all had enough of your shit and we're about to wrap up this whole operation so if you know what's good for you, you'll never call this number again or contact me anywhere. Ever again. I could hear the color drain from her face over the phone. Please don't, she says. I just want you to stop. She sounded terrified. I say if you stop calling this number I'll call it off. You decide your fate. I hung up. She never called again.
Had a coworker reach out to me from a job I had back in 2018. I said hello how is everything. They have schizophrenia. Live back with their parents. They accused me of stalking their LinkedIn and spreading gossip about them. Really sad call. Be glad you have your mind
131
270
8,738
503,463
Yousef | Developer | e/acc retweeted
I'm sure you're all wondering why I've gathered you here today.
221
1,843
24,520
932,780
A desktop RTX Spark chip that I can use with a custom built pc would completely heal me. Also cure cancer probably.
16