OpenHack

OpenHack

82 Photos and videos

Tweets

Hunar Batra retweeted

OpenHack

@OpenHackAI

Jun 11

OpenHack just crossed a 100 stars on GitHub! 🎉 Thank you for your all your support!

0:04

758

OpenHack

Hunar Batra retweeted

OpenHack

@OpenHackAI

Jun 10

Claude Fable 5 literally flags a simple port scan and switches back to Opus. This is why we're building OpenHack and betting hard on open source models.

OpenHack

@OpenHackAI

Jun 3

Introducing OpenHack. An Open Source Agentic Security Scanner that hunts and verifies vulnerabilities using open source models exclusively. Upto 40x cheaper, it is on par with Claude Opus 4.6 on CVE-Bench. Check it out at openhack.com!

2:41

750

OpenHack

Hunar Batra retweeted

OpenHack

@OpenHackAI

Jun 10

OpenHack is now open source using the MIT license! 🎉 Live on Github here: github.com/openhackai/OpenHa…

GitHub - openhackai/OpenHack: Open Source Agentic Security Scanner

Open Source Agentic Security Scanner. Contribute to openhackai/OpenHack development by creating an account on GitHub.

github.com

141

17,136

Paul Graham

Hunar Batra retweeted

Paul Graham

@paulg

Jun 9

I'm speaking tomorrow at 5:00 pm at the Oxford Union. I think you have to be a member or guest of a member to attend, but they put the talks on YouTube afterward.

944

82,834

Claude

Hunar Batra retweeted

Claude

@claudeai

Jun 9

Introducing Claude Fable 5: a Mythos-class model that we’ve made safe for general use. Its capabilities exceed those of any model we’ve ever made generally available.

0:20

4,997

14,523

104,692

55,800,329

Hunar Batra

Hunar Batra @hunarbatra

Jun 5

Join this challenge to build tools for detecting deception and auditing LLMs 🔥

NDIF @ndif_team

Jun 4

Can you tell when an AI model is lying? Announcing Aletheia's Quest, an AI lie detection challenge running this summer, organized by @cadenza_labs and @ndif_team. Multiple model organisms to interrogate and probe, $50K prize pool, no local GPU required.

268

Ananay

Hunar Batra retweeted

Ananay

@ananayarora

Jun 4

Excited to launch OpenHack! 🚀 A fully open source agentic security scanner to hunt and verify security vulnerabilities. Upto 40x cheaper, it is on par with Claude Opus 4.6 on CVE-Bench for finding logic based vulnerabilities in web apps.

2:41

13,817

OpenHack

Hunar Batra retweeted

OpenHack

@OpenHackAI

Jun 3

2:41

144

21,651

Claude

Hunar Batra retweeted

Claude

@claudeai

May 28

Introducing Claude Opus 4.8: it builds on Opus 4.7 with sharper judgment, more honesty about its own progress, and the ability to work independently for longer than its predecessors. Available today at the same price.

Benchmark table showing how Claude Opus 4.8 compares to its predecessor and to other models on tests of coding, agentic skills, reasoning, and practical knowledge work tasks.

ALT Benchmark table showing how Claude Opus 4.8 compares to its predecessor and to other models on tests of coding, agentic skills, reasoning, and practical knowledge work tasks.

3,688

8,627

67,440

15,238,637

Ronnie Clark

Hunar Batra retweeted

Ronnie Clark

@ronnieclark__

May 28

Introducing 𝗣𝗜𝗫𝗟𝗥𝗲𝗹𝗶𝗴𝗵𝘁 - our new method for precise image relighting. Our method allows you to specify the lighting you want exactly as in a standard 3D pipeline (no text prompts or exemplars). Try the interactive demo here: mlfarinha.github.io/pixl-rel… w/Miguel Farinha

226

26,000

Andrej Karpathy

Hunar Batra retweeted

Andrej Karpathy

@karpathy

May 19

Personal update: I've joined Anthropic. I think the next few years at the frontier of LLMs will be especially formative. I am very excited to join the team here and get back to R&D. I remain deeply passionate about education and plan to resume my work on it in time.

7,989

11,150

150,227

27,569,004

Y Combinator

Hunar Batra retweeted

Y Combinator

@ycombinator

May 18

95% of AI pilots fail. Not because the models are bad. Because teams can't build in sync. Deep Interactions (@deepintxns) is the collaborative AI builder that ships working products in an afternoon. The future of AI isn't more prompts. It's better collaboration. Congrats on the launch, @_sruvis! ycombinator.com/launches/QOt…

1:16

645

110,781

Alexander Whedon

Hunar Batra retweeted

Alexander Whedon

@alex_whedon

May 5

Introducing SubQ - a major breakthrough in LLM intelligence. It is the first model built on a fully sub-quadratic sparse-attention architecture (SSA), And the first frontier model with a 12 million token context window which is: - 52x faster than FlashAttention at 1MM tokens - Less than 5% the cost of Opus Transformer-based LLMs waste compute by processing every possible relationship between words (standard attention). Only a small fraction actually matter. @subquadratic finds and focuses only on the ones that do. That's nearly 1,000x less compute and a new way for LLMs to scale.

1:25

1,489

2,871

22,981

12,819,964

Apollo Research

Hunar Batra retweeted

Apollo Research

@apolloaievals

Apr 15

We evaluated Meta's Muse Spark prior to deployment and found it to verbalize evaluation awareness at the highest rates of any model we've tested. In the verbalizations Muse Spark explicitly names AI safety orgs (e.g. Apollo & METR) in its chain-of-thought and refers to scenarios as "classic alignment honeypots". On our evaluations, the model takes covert actions and sandbags to preserve its deployment.

Summer Yue

@summeryue0

Apr 14

🚀 Muse Spark Safety & Preparedness Report for Meta AI is out. We start with our pre-deployment assessment under Meta's Advanced AI Scaling Framework, covering chemical and biological, cybersecurity, and loss of control risks. Our assessment flagged potentially elevated chem/bio risk, so we implemented safeguards and validated mitigations before deployment - bringing residual risk to within acceptable levels. Beyond the Framework, we also share findings and early explorations of model behavior (honesty, intent understanding, etc.), jailbreak robustness, eval awareness, and more. We're sharing this report to give a closer look at how we evaluate advanced AI safety. Always more work to do, and we welcome feedback from the community. ai.meta.com/static-resource/…

423

115,076

Ananay

Hunar Batra retweeted

Ananay

@ananayarora

Apr 12

Marcus Hutchins, the guy famous for stopping the WannaCry Ransomware, probably has the best take on Mythos doing vulnerability research

5:15

108

552

4,315

509,568

Anthropic

Hunar Batra retweeted

Anthropic

@AnthropicAI

Apr 7

Introducing Project Glasswing: an urgent initiative to help secure the world’s most critical software. It’s powered by our newest frontier model, Claude Mythos Preview, which can find software vulnerabilities better than all but the most skilled humans. anthropic.com/glasswing

Project Glasswing: Securing critical software for the AI era

A new initiative to secure the world’s most critical software and give defenders a durable advantage in the coming AI-driven era of cybersecurity.

anthropic.com

1,985

6,645

44,011

31,426,287

Yong Zheng-Xin

Hunar Batra retweeted

Yong Zheng-Xin

@yong_zhengxin

Apr 6

🚨New paper! How safe and aligned is Kimi K2.5? We found concerning dual-use capabilities, sabotage and self-replication tendencies, political censorship on Chinese-language queries, and potential agentic misuse risks. (1/N)

106

22,719

Z.ai

Hunar Batra retweeted

Z.ai

@Zai_org

Apr 1

Introducing GLM-5V-Turbo: Vision Coding Model - Native Multimodal Coding: Natively understands multimodal inputs including images, videos, design drafts, and document layouts. - Balanced Visual and Programming Capabilities: Achieves leading performance across core benchmarks for multimodal coding, tool use, and GUI Agents. - Deep Adaptation for Claude Code and Claw Scenarios: Works in deep synergy with Agents like Claude Code and OpenClaw. Try it now: chat.z.ai API: docs.z.ai/guides/vlm/glm-5v-… Coding Plan trial applications: docs.google.com/forms/d/e/1F…

1:18

251

649

5,740

1,960,302

Demis Hassabis

Hunar Batra retweeted

Demis Hassabis

@demishassabis

Apr 2

Excited to launch Gemma 4: the best open models in the world for their respective sizes. Available in 4 sizes that can be fine-tuned for your specific task: 31B dense for great raw performance, 26B MoE for low latency, and effective 2B & 4B for edge device use - happy building!

325

871

7,996

991,892

Qwen

Hunar Batra retweeted

Qwen

@Alibaba_Qwen

Apr 2

（1/8）🚀 Introducing Qwen3.6-Plus: Towards Real-World Agents! 🤖 Today, we’re thrilled to drop a major milestone in our journey toward native multimodal agents. Here is what makes Qwen3.6-Plus a game-changer： 💻 Next-level Agentic Coding: Smarter, faster execution. 👁️ Enhanced Multimodal Vision: Sharper perception & reasoning. 🏆 Top-tier Performance: Maintaining leading general capabilities. 📚 1M Context Window: Available by default via our API. Built on your invaluable feedback from the Qwen3.5 era, we’re laying a rock-solid foundation for real-world devs. Get ready to experience truly transformative ✨ Vibe Coding ✨. Huge thanks to our community! Go try it out and show us what you can build. 👇 Chat: chat.qwen.ai/ API: modelstudio.console.alibabac… Blog: qwen.ai/blog?id=qwen3.6 🔔Noted：More Qwen3.6 models to come and be open-sourced! Stay tuned~ 👀#Qwen #AI #AgenticCoding #VibeCoding #Agents

240

656

5,017

1,059,305