Filter
Exclude
Time range
-
Near
Replying to @jp54362
These benchmarks don’t paint accurate picture. 5.5 is borderline retarded compared to fable in real world task and intelligence. It just gets it.
11
Madhav Jha retweeted
"Finally, our interventions designed for validation loss generalize to downstream benchmarks, achieving a 9% improvement for pre-training evals and a 17.5Γ— data efficiency improvement over continued pre-training on math mid-training data. Our results show that simple algorithmic improvements can enable significantly more data-efficient pre-training in a compute-rich future."
1
1
23
1,238
Replying to @Sauers_
nvidia benchmarks say that le chaton quantised can deliver a whopping 100,000 concurrent le chatons per GB300 rack @nvidia !!!
2
21
A symbol of ambition, development, and modern Indiaβ€”Jewar is setting new benchmarks for infrastructure excellence. #Jewar_ki_Udaan
1
Part 15 of my AI safety series is about biorisk evals. Labs run biological knowledge benchmarks, publish the scores, and tie them to ASL triggers. The numbers keep going up. Frontier models are matching or beating expert baselines on public tests. Epoch asks; do those scores actually tell us whether someone could develop a bioweapon, or are we measuring something easier that saturates while the hard parts stay untested? Knowledge on a multiple-choice test is not the same as months in a lab with reagents, failed cultures, and supply chains. I wrote about benchmark saturation, what ASL evidence can and cannot support, and why β€œpassed the bio eval” is not the same as β€œwe understand the risk.” thabheloduve.com/blog/bioris… #aisafety

Part 14 of my AI safety series is about multi-agent risk. We spent years asking whether one model is safe in isolation. Deployment is not just one model but copilots talking to copilots, buyer agents negotiating with seller agents, coding assistants chained across repos. Each node can pass its own safety review and the system can still deadlock, escalate, or collude in ways nobody trained for. I wrote about Hammond et al.’s taxonomy: miscoordination, conflict, collusion, and why alignment does not compose automatically. thabheloduve.com/blog/multi-… #aisafety
1
3
Replying to @hampsonw
Hope it helps! Feels like it’s impossible to find what fits in a given gpu, so I’m trying to make that easier. 320 hours of benchmarks and counting!
2
hence what you get is overfitting to thise benchmarks. moreover... who says those benchmarks actually capture the type of reasoning needed for recursive self improvement
4
no. i reject those benchmarks as valid. there is a fundamental flaw in them. basically they are FIXED. hence once fixed you will get models being trained to fit those benchmarks.
2
5
Sagar Patnaik retweeted
We spend hours discussing AI models, benchmarks, and valuations. Meanwhile, one of the biggest questions remains: Who owns the interface between you and your AI? The answer to that question may end up being more important than the model itself. Privacy matters. Local interfaces matter. Subnet 95 matter. Let that sink in. $TAO #SN95
Guess what? local ai for everyone (: we did it local.ai
2
3
12
1,131
Zack Knight 😎 retweeted
noting that its products are not only meeting global benchmarks but are also being used at the highest level of international sport, including the FIFA World Cup.
2
1
75
MiddleOut MiddleOut is the product name for this compression-first video streaming pipeline. It builds standards-compatible HLS variants, benchmarks them against competitor-style baselines, and writes evidence reports instead of vague claims. The current implementation package and backwards-compatible CLI are still namedratstream while the project is being promoted into the MiddleOut repo. it’s almost done πŸ₯Ή
7
Replying to @saurabh_shah2
tbh the irony of benchmarks that can't keep up with the models they're testing is peak 2026
4
Replying to @AKirtesh
I tested fable 5 on the releasing date, the benchmarks were really promising to me, that's just incredible.
39
Replying to @HEI
tbh the bias angle is underrated, most benchmarks just assume competence without checking for shortcut learning
Danmar Pedre Valdes retweeted
Replying to @NetworkChuck
These "open models" don't hold a candle to the frontier models (and don't quote benchmarks at me, have you actually used and tested them all?). I would love for them to, but they don't. Don't go spending thousands of dollars to run a local model that is 60% of a frontier model (trust me that other 40% makes all the difference in the world). So what do you reccomend people do Network Chuck? Go spend thousands of dollars for a MASSIVELY DEGRADED Experience, or just pay a subscription fee? $200/mo = $2400/yr for a top model, and in 12 mo those frontier models will be even better and more expanded. Open source isn't catching up anytime soon, and when it finally DOES reach a "Opus" level quality, you will need $20k to buy the hardware to run it. You influencers pushing this stuff do no good for the ai space.
1
1
123
Replying to @signulll
But model quality still falls far short on countless ordinary tasks. Benchmarks are misleading. The map is not the territory.
3
tbh the fact that it works across all three benchmarks without task-specific tuning is what makes this actually impressive
AI is not only disrupting what investors buy, but how diversification asa whole used to work. With SpaceX now, likely OpenAI and Anthropic next, the big ETF funds become forced buyers once AI names enter benchmarks. The risk is not the AI bubble. The risk is that every portfolio becomes the same AI trade.
19
More on the Iran Deal: "The Deal with the Islamic Republic of Iran is now complete. Congratulations to all!" Trump said Sunday on Truth Social, adding that he had agreed to end the U.S. naval blockade on Iranian ports in exchange for Iran's reopening of the Strait of Hormuz and a continued halt to fighting. The two sides plan to keep discussing the limits to Iran's nuclear program that Trump has sought. They're scheduled to sign the agreement on Friday. The timing of the announcement on Sunday allowed Trump to celebrate the deal on his 80th birthday. U.S. officials said the deal was in jeopardy earlier in the day after Israel launched an air strike in the southern suburbs of Beirut, Lebanon's capital. As Iran prepared a retaliatory strike, Trump publicly rebuked Israeli Prime Minister Benjamin Netanyahu for endangering the talks. "After the Israelis struck Beirut, we were very worried," Vice President JD Vance said in an interview with Fox News after the deal was announced. "We saw a lot of evidence that the Iranians were going to launch a large number of missiles at the Israelis." During what Iran's deputy foreign minister, Kazem Gharibabadi, said were more than 14 hours of talks on Sunday, the Iranians ultimately stood down. They said the U.S. had made last-minute concessions in return, including speeding up the end of the naval blockade. "The naval blockade against Iran will end immediately and completely," Iran's Supreme National Security Council said in a statement, according to the semi-official Fars news agency. Pakistani Prime Minister Shehbaz Sharif said on social media that both sides "have declared the immediate and permanent termination of military operations on all fronts, including in Lebanon," and that an official signing ceremony will be on Friday in Switzerland. Mediators will hold a series of meetings this week, Sharif said. The deal appeared to leave significant areas of disagreement unresolved and subject to further negotiations, especially Iran's nuclear program and the wide range of U.S. sanctions that have been imposed on Tehran. Gharibabadi said his government expected to discuss a full lifting of U.S. sanctions in talks he said would take place over the next 60 days, according to IRNA, a state-supported news agency. Gharibabadi said those negotiations would only begin once Iran can verify that the U.S. has complied with the current agreement, which he said included unfreezing Iranian assets and lifting the naval blockade. U.S. officials, however, said Iranian assets would not be unfrozen until Tehran had demonstrated compliance with the agreement. Neither side released the text of what they had agreed to. Trump said the Strait of Hormuz would be reopened on Friday to remove the mines that Iran has laid, which have obstructed shipping traffic and driven up oil prices for months. Despite the question marks, oil futures prices began to decline on Sunday after the deal was announced. Iranian leaders made clear that they believed there was much still to discuss. Iranian state-run media suggested Tehran still intended to exert oversight over the Strait of Hormuz in partnership with Oman. If Iran continues to threaten closures, shipping oil from the Persian Gulf would become permanently riskier even if traffic started to approach pre-war levels. Vance said in his Fox interview that the deal marked a major success for the U.S. because the Iranian leadership had agreed that "Iran will never have a nuclear weapon, and not just pursue a nuclear weapon, but procure or try to buy a nuclear weapon as well. That's built into this agreement." Iran has long said that it does not seek a nuclear weapon and made agreements that it would not pursue one. Across multiple administrations, U.S. negotiators have typically focused on finding ways to impose limits on Iran's nuclear enrichment and devising surveillance that would ensure Tehran abides by the limits it has agreed to. Iran's deputy foreign minister suggested Sunday that most of the thorny nuclear details were yet to be agreed. Neither he nor the security council's statement mentioned any concrete commitments about nuclear issues, and the White House did not claim that Tehran had made any commitments beyond the overall promise not to build or buy a weapon. Unfreezing Tehran's assets would be a significant concession, depending on how much of the billions of dollars are made available to Iranian leaders. The seeming differences in scope between the U.S. and Iranian announcements sparked concern from Sen. Lindsey Graham (R-South Carolina), an Iran hawk who has long urged Trump to take an aggressive stance toward Tehran. "I am somewhat concerned that Iran's view of the agreement seems different than what the American negotiating team is claiming," he said on X. Despite the limits of the deal so far, the administration portrayed it in sweeping terms. Vance said that he was hopeful the negotiations could lead to a reshaped Middle East. "If the Iranians comply with this deal, it is going to fundamentally transform the Middle East for the next 50 years," he told Fox. "It's going to mean a lot of prosperity, lower energy prices for the American people." He added that "I know that this has been a hard time for a lot of Americans" because of high energy prices, "but this will be transformative." Vance said he planned to attend the signing ceremony in Switzerland, but that it was possible that Trump would do it himself. The president plans to depart early Monday for three days of meetings in France with the leaders of the Group of Seven major economies. The leaders of Britain, France, Germany and Italy, all of whom will see Trump at the summit, said that they stood ready to assist the effort to remove mines from the Strait of Hormuz and engage in other defensive efforts there. Trump did not consult with European leaders before launching the attack on Iran on Feb. 28, but he blasted them afterward for being slow to aid U.S. efforts or otherwise criticizing his strategy. The war has been the latest strain on transatlantic relations in an era in which Trump has been skeptical of allies. The February attack was launched in close coordination with Israel, and Netanyahu had convinced Trump in the weeks leading up to the war that the time was right to seek to depose Iran's leadership. But Trump has gotten frustrated with Netanyahu in the months since, repeatedly criticizing the Israeli leader's aggressive attitude toward Iran and its proxy in Lebanon, Hezbollah. Trump had said earlier Sunday that the U.S. and Iran were "very close" to agreeing on a deal to end the fighting β€” and urged all parties not to "blow it" after the Israeli attack in Beirut, which Israel said was aimed at a Hezbollah command center. Trump on Truth Social on Sunday said Israel had the right to defend itself but criticized its strikes, saying they "should not have happened." The president, in interviews with two outlets, Fox and Axios, said he spoke with Netanyahu and disparaged his judgment in launching the latest strike, using profanity to describe their tense conversation. "It is so bad β€” I couldn't believe it. An hour before we are supposed to sign the deal," Trump told Axios, underscoring the tension that has become increasingly public as the Iran war has dragged on. "Why did Bibi have to do a fucking attack? I was so pissed off. I let him know. He has no fucking judgment. I let him know that," Trump added. Israel said the attack was in response to Hezbollah's launch of three projectiles toward northern Israel. Iran has repeatedly called on the U.S. to prevent Netanyahu from expanding military operations in Lebanon as part of the negotiations over the extension of the U.S.-Iran ceasefire. Officials from the U.S., Iran and Pakistan had earlier outlined the contours of the expected deal. The initial deal was expected to include an Iranian commitment over the next 60 days to work with the U.S. to dismantle nuclear material in the country that could be used to create a weapon, officials said. In exchange, Iran would eventually receive relief from sanctions and the U.S. blockade, as well as access to billions of dollars in frozen assets β€” but would have to reach certain benchmarks that would be set through further negotiations.
1
1
8
2,255