Arnav Gupta

Arnav Gupta

136 Photos and videos

Tweets

Pinned Tweet

Arnav Gupta

@_ar9av

Jun 9

MLX has been around for a while, I've been using it for internal experimentation and knowledge management (through obsidian-wiki) Great for running models locally, Apple doubled down on it at WWDC yesterday

123

Arnav Gupta

Arnav Gupta

@_ar9av

day 19 of reading one arxiv paper around AI every day and sharing what actually stuck Creative Machine (Concordia University) TLDR: generating creative-looking outputs and being genuinely creative are different things this paper draws the line and gives 10 requirements for crossing it, most current AI systems fail on the ones that matter most

more replies

Arnav Gupta

Arnav Gupta

@_ar9av

the ethical implication is specific: a genuinely creative system doesn't just produce artifacts. it intervenes in situations and reshapes them. that means ethical constraints can't be a post-generation review who's affected, what harms are possible, what's out of bounds those have to shape what the system perceives, what it identifies as a conflict and which interventions it even considers

Arnav Gupta

Arnav Gupta

@_ar9av

Code repo: github.com/Ar9av/daily-resea…

Arnav Gupta

Arnav Gupta

@_ar9av

Jun 13

day 18 of reading one arxiv paper around AI every day and sharing what actually stuck CODE (Southeast University / NTU) tldr: COntradiction based Deliberation Extension framework for overthinking attacks on RAG inject one document into a RAG knowledge base the reasoning model retrieves it, hits a contradiction it can't resolve and spends up to 25x more tokens per query, the final answer stays correct, your accuracy monitoring never fires

more replies

Arnav Gupta

Arnav Gupta

@_ar9av

Jun 13

the tested defenses (prompt brevity instructions, retrieval filtering) reduce the amplification but don't stop it two things worth adding to any RAG deployment: flag queries that use 5x your typical token budget, since the contradiction causes a visible spike even when the answer is correct. documents reach the model, scan for explicit logic traps like "exactly two of the following are true" and strip them from the context

Arnav Gupta

Arnav Gupta

@_ar9av

Jun 13

Paper link: arxiv.org/abs/2601.13112v1 Try it in your agent: github.com/Ar9av/daily-resea…

CODE: A Contradiction-Based Deliberation Extension Framework for...

Introducing reasoning models into Retrieval-Augmented Generation (RAG) systems enhances task performance through step-by-step reasoning, logical consistency, and multi-step self-verification....

arxiv.org

Arnav Gupta

Arnav Gupta

@_ar9av

Jun 13

tried classifying prompt injections at scale the obvious ones are easy the hard cases look exactly like legitimate context that gradually nudges the agent's goals over multiple turns

Arnav Gupta

Arnav Gupta

@_ar9av

Jun 13

the skills marketplace for AI agents is basically npm circa 2015 no signing, no sandboxing, just install and trust we learned this supply chain lesson once already

Arnav Gupta

Arnav Gupta

@_ar9av

Jun 12

day 17 of reading one arxiv paper around AI every day and sharing what actually stuck SpatialClaw (NVIDIA) tldr: the bottleneck in spatial reasoning agents is the action interface, not the tool catalog give the VLM a persistent Python kernel and let it write one cell at a time, seeing masks and depth maps before deciding the next step

more replies

Arnav Gupta

Arnav Gupta

@_ar9av

Jun 12

the action interface matters more than the tool catalog three things the ablations validate: a persistent kernel beats single-pass code on every task category numpy/scipy as fallback matches pre-defined utility wrappers. the same system prompt generalizes across 20 benchmarks without modification. the setup is minimal: perception tools as Python callables, show() for intermediate visual feedback, AST safety check before execution

Arnav Gupta

Arnav Gupta

@_ar9av

Jun 12

paper: arxiv.org/pdf/2606.13673 code: github.com/NVlabs/SpatialCla…