Accelerating Drug Repurposing with AI: The Role of Large Language Models in Hypothesis Validation
1.This study evaluates how Large Language Models (LLMs) can help validate drug repurposing hypotheses by analyzing biomedical literature. GPT-4o and DeepSeek stood out as the most reliable models when guided by well-designed prompts.
2.By using a pathway-based method to propose 21,968 drug-disease associations, the authors narrowed down to 30 representative cases and tested how well LLMs can classify them as viable or non-viable, simulating expert review.
3.They tested 10 different prompt strategies across 4 LLMs: GPT-4o, Claude-3, Gemini-2, and DeepSeek. The most effective strategies involved Chain-of-Thought reasoning, Few-shot examples, and Explicit Reasoning, confirming prior insights in biomedical NLP.
4.Among LLMs, GPT-4o had the best overall accuracy (83%) and precision, while DeepSeek had the best balance of precision (81%) and recall (92%). Claude-3 and Gemini-2 underperformed by comparison.
5.The best prompts (Few-shot, Chain-of-Thought, and Explicit Reasoning) significantly improved F1 scores, with structured prompts outperforming Zero-shot settings. This highlights the importance of carefully crafted prompt design.
6.In a second phase, the authors evaluated these top prompts on 30 new pathway-based repurposing cases. GPT-4o again had the best precision, while DeepSeek showed superior recall—useful in exploratory settings where false negatives are costly.
7.A third evaluation phase used 10 benchmark drug-disease repurposing cases based on established literature. Here, both GPT-4o and DeepSeek achieved near-perfect accuracy and F1 scores (0.92), suggesting strong alignment with existing biomedical knowledge.
8.Prompt impact was lower in benchmark cases, implying LLMs are more confident and consistent with well-documented associations—likely due to their alignment with LLM training data.
9.An illustrative example: LLMs identified verapamil as a viable candidate for diabetes mellitus through shared pathways, a prediction supported by the literature. Conversely, paclitaxel was correctly rejected for obesity, despite a shared pathway, due to toxicity concerns.
10.Overall, LLMs showed clear potential to assist in the early stages of drug repurposing by filtering viable candidates and reducing reliance on exhaustive manual review. Human oversight remains critical to manage hallucination and false positives.
11.Limitations include small dataset size, lack of fine-tuning, and citation hallucinations. Future work should explore biomedical fine-tuning and retrieval-augmented generation to boost credibility and reduce spurious claims.
💻Code:
github.com/iratxe-zunzunegui…
📜Paper:
biorxiv.org/content/10.1101/…
#DrugRepurposing #LLM #AIinMedicine #PromptEngineering #BiomedicalNLP #GPT4o #DeepSeek #PathwayAnalysis