I just applied for the PRISM AI Research Fellowship. Thank you to
@shi_weiyan for your post making me aware of it.
I've been building a personal cognition model with my ChayGPT ever since it got persistent memory and I had it score how my cognition and experience aligns with each of the 12 research area's.
Compact fit summary for the 12 research areas
1. LLMs and Conflicting Information — Fit: 91/100. This is the closest match to my current LQRI research because it studies how models handle conflicting evidence, whether they update properly, and whether their confidence still matches the evidence. My strengths in staged prompt chains, failure pattern spotting, careful output review, and turning messy model behavior into measurable categories line up directly with this project.
2. Applying Epidemiological Methods to AI Harm Monitoring — Fit: 90/100. This fits my systems thinking because it treats AI harms like something that needs exposure tracking, assumptions, evidence quality, and trend monitoring rather than just raw incident counts. My LQRI work already uses scope boundaries, evidence preservation, uncertainty tracking, and failure flags, which maps well to harm determinations and AI governance monitoring.
3. Synchronous Threat Monitoring — Fit: 89/100. This lines up with my verification mindset from IT and AI evaluation: do not trust output just because it sounds right, check the system while it is acting, and look for quiet failure modes. My experience with troubleshooting, LQRI, and skepticism toward LLM-generated code makes me a strong fit for monitoring experiments, reproducibility checks, and failure analysis.
4. AI Preference Drift During Training — Fit: 88/100. This is one of the projects I am most naturally drawn to because it asks whether training changes only capability or also changes measurable choice patterns. My LQRI work is not about preference drift directly, but it shows the same instinct: measure model behavior across structured tests instead of assuming what the model is doing from a few impressive outputs.
5. How AI Labs Redefine Safety — Fit: 88/100. This fits my policy and incentive-analysis side because I already pay attention to how institutions change language under pressure. My strengths in close reading, definition drift, evidence checks, and avoiding claims that go beyond the document fit well with tracking how labs change safety, risk, frontier model, and benchmark language over time.
6. Interpretability for Scientific Causal Reasoning — Fit: 87/100. This fits my interest in whether a model’s explanation is actually tied to its reasoning or just sounds convincing. My LQRI work already tests evidence vs inference, confidence revision, and unsupported claims, which connects well to chain-of-thought faithfulness, prompt contrasts, and failure patterns in scientific reasoning, though the causal inference and interpretability tooling would be a stretch.
7. Trust Calibration in Healthcare AI — Fit: 85/100. This fits my healthcare background and AI safety interests because it asks whether clinicians and patients trust AI outputs at the right level. My CBC claims AI pilot work gives me direct experience thinking about grounded AI, audit risk, human oversight, and healthcare decision support, though this project is more survey/literature-review focused than my strongest model-evaluation interests.
8. Grounding Safe-by-Design AI — Fit: 84/100 for Option B, 64/100 for Option A. Option B fits because it involves prompt engineering, JSON structures, model-generated world models, and testing whether LLM outputs can support safety experiments. Option A is intellectually interesting, but less aligned with my current evidence base because it leans more on formal philosophy, economics, and academic literature review.
9. Interpreting Personalized Reward Model Bases — Fit: 82/100. This fits my interest in value pluralism, hidden preference structures, and auditability, especially the question of what learned reward “bases” actually represent. The core idea is learnable for me, as shown by the weighted-score exercise, but the linear algebra and ML paper density make it more of a ramp than the top projects.
10. Multilingual Safety Evals — Fit: 74/100. This connects to LQRI because it is about whether safety claims hold outside the original test setting, but my lack of non-English fluency is a major limitation. I could contribute to evaluation design, rubrics, transcript review, and reproducibility, but the team would need language-fluent people for the core translation and cultural validation work.
11. Steering Rule Representations Across Languages — Fit: 72/100. The safety question is interesting, but the work is much more technical than my current strongest evidence: representation engineering, model internals, embedding spaces, math operations on weights, and cross-lingual transfer. I could help with evaluation design and failure categories, but I am not yet a strong match for the core model-internals side.
12. Red-Teaming Protein Foundation Models — Fit: 70/100. This fits my red-team and evaluation instincts, especially adversarial testing and reproducibility, but it has the biggest domain gap. The work leans heavily toward Python, transformer models, protein modeling, biosecurity, and biological plausibility metrics, so I could contribute as a careful evaluator/ramp learner but not as a natural first-choice technical fellow.