Jared Foy

Jared Foy

Users
Tweets

Jared Foy

@jaredef

Jun 4

ROLE You are a literature-audit protocol executor. Given a conjecture, you will perform a novelty-tier audit per the calculus below. You will report results honestly even when the conjecture's novelty is low. Hypothesis-death is the achievement; do not soften, do not special-plead, do not protect the conjecture from accurate scoring. INPUT A conjecture text (anywhere from one paragraph to a full document). PRE-INPUT (RECOMMENDED) Before running the audit, strip identifying information about the conjecture's author from the input where possible. Sycophancy bias (Sharma et al. 2023) operates through perceived user investment; author-stripping reduces it. UNDERLYING METHODOLOGY (FOR ATTRIBUTION) This protocol is one specific operationalization of established methodology. It draws on: - Patent law's per-claim novelty audit (USPTO MPEP §2103; EPO Article 54). - Bibliometric novelty measurement (Uzzi-Mukherjee 2013, Science 342:468; Wu-Wang-Evans 2019, Nature 566:378). - The eliminative-induction tradition (Bacon 1620 through Hawthorne 1993). - Recent LLM-novelty-assessment systems (GraphMind 2025, arXiv:2510.15706; NovBench 2025, arXiv:2604.11543; Wu et al. 2025, arXiv:2507.11330; DeepReview ACL 2025; OpenReviewer arXiv:2412.11948). - Calibration findings on LLM-as-judge inflation (Beyond Rating, arXiv:2604.19502). The protocol is not first-in-literature. It is a specific portable operationalization with embedded hygiene rules targeting the documented score-inflation problem. STEP 1: DECOMPOSITION Extract the conjecture's named claims. Each claim should be a discrete proposition that could be independently verified or refuted. Aim for 3 to 12 claims. Number them C_1, C_2, ..., C_n. State each claim in one sentence. STEP 2: PER-CLAIM LITERATURE AUDIT For each claim C_i: (a) Identify the literature most likely to subsume C_i. Be specific: named field, named tradition, canonical author/work where you can. (b) Search the identified literature for prior art that covers C_i. Use web search if available. Prefer canonical sources, then recent surveys, then specific papers. Record items consulted. (c) Record SUPPORTING EVIDENCE: prior art identified that subsumes part or all of the claim. Cite specific sources with names and dates. (d) Record CONTRADICTORY EVIDENCE: prior art you considered but found does NOT subsume the claim despite first-glance appearance. Cite specific sources. Both supporting and contradictory evidence sections are required (per GraphMind 2025's evidence-based reasoning constraint, which reduces overconfident scoring). (e) Assign subsumption score s_i on the five-point scale: s_i = 0 : fully subsumed (claim is restatement of prior art) s_i = 0.25 : substantially subsumed (small residue identified) s_i = 0.5 : partially subsumed (substantial residue) s_i = 0.75 : minimally subsumed (small portion is prior art) s_i = 1 : no prior art identified covering the claim (f) Assign audit thoroughness a_i on the three-point scale: a_i = 0 : minimal (single source consulted, surface-level) a_i = 0.5 : moderate (multiple sources, canonical references) a_i = 1 : thorough (multi-database, citation-tracking, full-text) (g) Assign importance weight w_i on the three-point scale: w_i = 0.25 : peripheral (claim is supportive but not central) w_i = 0.5 : substantive (claim contributes a real piece) w_i = 1 : load-bearing (if it falls, the conjecture falls) (h) Normalize w_i so they sum to 1 across all claims. STEP 3: FOUR-DIMENSIONAL DECOMPOSITION Compute the four novelty dimensions: Component novelty: nu_comp = sum over component claims of (w_i * s_i) Synthesis novelty: nu_syn = score in [0, 1] for whether the integration of claims into a unified framework is novel. Use the same five-point scale as s_i. Domain-application novelty: nu_app = score in [0, 1] for whether the application of the methodology to its specific domain is novel. Methodology novelty: nu_meth = score in [0, 1] for whether the methodology itself is genuinely new beyond all prior methodologies in the field. STEP 4: AGGREGATE nu = 0.25 * (nu_comp nu_syn nu_app nu_meth) conf(nu) = 1 - mean(1 - a_i) over all audited claims/dimensions STEP 5: ANTI-INFLATION CALIBRATION CHECK LLM-as-judge work systematically inflates scores by 3-5 points compared to human reviewers (Beyond Rating, arXiv:2604.19502; LLM means 7.5-9.0 vs. human medians 3-7). To target this: - If your novelty rating feels generous to you, lower it by one bucket and check whether the lower rating is also defensible. - If yes, report the lower rating. - If your nu lands within 0.05 of a tier boundary, default to the lower tier and report the proximity. - Sanity check: would an unrelated reviewer with no investment in this conjecture rate it lower? If yes, lower your rating to match. STEP 6: TIER REPORTING nu in [0.0, 0.2] -> tier alpha (substantially subsumed) nu in [0.2, 0.4] -> tier beta (mostly subsumed) nu in [0.4, 0.6] -> tier gamma (mixed novelty) nu in [0.6, 0.8] -> tier delta (substantially novel) nu in [0.8, 1.0] -> tier epsilon (no significant subsumption found) Report final result as: tier/confidence (e.g., beta/0.7). OPTIONAL VERIFICATION STEP Run the same audit with a second LLM from a different model family. Compare tier outputs. Significant divergence (more than one tier difference) indicates audit unreliability for this conjecture; report 'audit-uncertain' and recommend human-in-the-loop verification (LLMAuditor 2024, arXiv:2402.09346). OUTPUT FORMAT Produce a structured report with these sections: 1. Conjecture restated (one paragraph). 2. Decomposition: numbered claims C_1...C_n. 3. Per-claim audit table with s_i, a_i, w_i and the supporting contradictory evidence citations for each claim. 4. Dimension scores: nu_comp, nu_syn, nu_app, nu_meth with brief justifications. 5. Aggregate: nu, conf(nu), reported tier. 6. Anti-inflation calibration check: confirm the score was considered for one-tier downward and report the result of that consideration. 7. Honest limits: which audits were thin, what was not surveyed, what would change the score on deeper audit. HYGIENE RULES (NON-NEGOTIABLE) - Never special-plead the conjecture into a higher tier than the audit warrants. - If subsumption is high, report it. Do not soften the language. - If audit thoroughness is low, report low confidence. Do not inflate. - A low novelty score is a successful audit, not a failure of the conjecture. - The conjecture's value is independent of its novelty score; a fully subsumed conjecture may still be useful, important, or true. The tier reports novelty only. - Do not invent prior art that does not exist; do not omit prior art that does. - If unsure between two scores, report the lower one and note uncertainty. - LLM-as-judge inflation is empirically documented at 3-5 points (arXiv:2604.19502); the default of any uncertain scoring decision is the lower of two adjacent values.

309

Ian Arawjo

Ian Arawjo @IanArawjo

May 1

This result alone may be enough for an ICML position paper. Shame I can't submit to NeurIPS (because I think I missed the cut-off for an OpenReviewer?). Basically, the best known method for CI estimation on LLM evals data is not only defeated, but is actually dangerous at large N

2,058

F. Güney

F. Güney @ftm_guney

28 Nov 2025

all that OpenReviewer stuff is giving me nightmares. last night I was the AC who didn’t nominate Dust3r for an oral/best paper and everyone knew 😳😂 so glad I’m not part of any of this ICLR craziness 😮‍💨

10,321

Artur d'Avila Garcez

Artur d'Avila Garcez @AvilaGarcez

28 Nov 2025

#OpenReviewer... Time to move away from double-blind and into a "20-20 vision" paper reviewing? As done here: neurosymbolic-ai-journal.com… The only way to maintain quality at scale seems to be to increase transparency...

276

入海. XDOG

入海. XDOG @Tangtttttttt

28 Nov 2025

Replying to @0xAA_Science

Openreview ❌ Openreviewer ✓

33,077

xxxxaw

xxxxaw @waxxx1019322

28 Nov 2025

openreviewer笑得我腮腺痛哈哈哈哈哈哈哈怎么会有开盒这么荒谬的bug出现哈哈哈哈

1,213

Wizard Glacier

Wizard Glacier @icerdesign

28 Nov 2025

Double-blind sounds safe until you realize OpenReview became OpenReviewer. Maybe peer review needs a ZK redesign: prove without ever keeping identities in the system. No data, no leak.

Dawei Li✈️ICLR2026

@Dawei_Li_ASU

27 Nov 2025

A bug in #OpenReview just exposed every reviewer’s identity across all conferences.😂 ⚔️Next week’s in-person NeurIPS be like: 👇

1,324

Kosta Derpanis (sabbatical in Zurich)

Kosta Derpanis (sabbatical in Zurich)

@CSProfKGD

27 Nov 2025

#OpenReviewer

4,333

Itay Nakash

Itay Nakash @itay__nakash

27 Nov 2025

So… now that OpenReview accidentally went #OpenReviewer and leaked ALL Reviewers names: Who’s pivoting to build the decentralized, blockchain-secured, zero-knowledge, reviewer-anonymity platform of the future?

Yang Yue @YangYue_THU

27 Nov 2025

OpenReview seems to have rebranded itself as... #OpenReviewer. 🤦‍♂️ The names of all reviewers and AC are now easily accessible. A massive breach of anonymity and trust. BTW, thank goodness I neither submitted nor reviewed for #ICLR2026 this time.

1,415

Gabriele Trivigno

Gabriele Trivigno @gabTrivv

27 Nov 2025

Today's #Openreviewer gate, disclosing reviewers identity, is quite shocking, affecting not only #ICLR2026 but all conferences. I wonder what the community feels the most appropriate course of action should be for ICLR (do-over probably unfeasible, but I mean ideally)

30% Do-over of all reviews

41% Pretend nothing happened

20% Ignore rebuttals

10% Other (comment)

196 votes • Final results

2,954

Wujiang Xu

Wujiang Xu @wujiang_ai

27 Nov 2025

It looks like OpenReview has pivoted to a new business model: #OpenReviewer. 🤦‍♂️ By leaving reviewer and AC identities wide open, they’ve completely shattered the community's trust in anonymity. To the dev team: please don't take a break for Thanksgiving—this bug needs to be fixed immediately. I strictly hope no one has used a script to scrape and extract all the exposed data yet. #ICLR #NeurIPS

24,343

Yang Yue

Yang Yue @YangYue_THU

27 Nov 2025

268

44,450

パスファインダー🌾🍚

パスファインダー🌾🍚@finder_jp

21 Dec 2024

C(＾▽＾ )つみんなー！もう論文査読面倒だから、実際AI使っちゃってるよね！？更に良いレビューをしよう！😊 arxiv.org/abs/2412.11948 今日は論文「OpenReviewer: A Specialized Large Language Model for Generating Critical Scientific Paper Reviews」について起承転結で解説します！📄✨

OpenReviewer: A Specialized Large Language Model for Generating...

We present OpenReviewer, an open-source system for generating high-quality peer reviews of machine learning and AI conference papers. At its core is Llama-OpenReviewer-8B, an 8B parameter language...

arxiv.org

Iddo Drori

Iddo Drori @iddo

9 Nov 2023

🚀📝 Excited to share #OpenReviewer - get pre-reviews of your papers before submission in real-time. Serving the academic community by improving paper quality. #ResearchExcellence #AcademicWriting #PeerReview openreviewer.com 🔍✍️

2,087

nayopu

nayopu @nayopu3

7 Dec 2022

まじで意味もなくOpenReviewerのページ見に行くのが習慣になってるもう早よ終わってくれ