ECHO-PPI: Trustworthy AI for Evidence-Bundled Detection of Overlapping Protein Modules in Protein–Protein Interaction Networks
1. ECHO-PPI is an evidence-bundled framework for overlapping module detection in PPI networks that explicitly targets curator-facing interpretability: every protein–module assignment is exported with an auditable bundle of topology, semantic, and Gene Ontology (GO) evidence, plus a hierarchical confidence label (core/inner/outer/uncertain).
2. The key design shift is from “module lists” to “assignment-level decision support”: instead of only returning overlapping communities, ECHO-PPI records why a specific protein is placed into a specific module, and whether that membership should be treated as strong (core) or boundary/triage (inner/outer/uncertain).
3. The workflow integrates three evidence channels: (i) weighted PPI topology, (ii) semantic protein profiles (TF–IDF SVD embeddings of text/GO labels; Sentence-BERT is optional but not used in reported benchmarks), and (iii) GO evidence via module-specific GO TF–IDF “functional signatures,” while treating missing GO as missing evidence rather than biological absence.
4. For overlap-aware membership scoring, ECHO-PPI combines a boundary-sensitive topology metric (permanence) with functional dependency (alignment of a protein’s GO terms to a module’s GO TF–IDF signature) using a transparent mixture score M(p, C) = α Perm(p, C) (1−α) fd(p, C). Overlaps are added only if they improve the best existing assignment by a margin (gain threshold), with a conservative transfer rule for likely misplacements.
5. Candidate generation is broadened beyond a single clustering output: starting from MCL modules, ECHO-PPI adds nucleus-centered ego neighborhoods (1–2 hops), greedy topology–semantic expansions, semantic kNN sets that pass a graph-support filter, and hybrid unions when candidate Jaccard overlap is high; candidates are then scored with penalties for uncertainty and fragmentation.
6. A distinctive component is the deterministic “evidence-potential nucleus” score (BH), inspired by gravity-based representative selection, to prioritize high-support local nuclei using weighted degree, clustering coefficient, k-core, semantic neighborhood coherence, GO richness, and an annotation-sparsity penalty. Importantly, nuclei guide candidate construction and ranking rather than defining the final partition alone.
7. ECHO-PPI adds “recall-safe supplementation” to avoid the common failure mode of naive boundary expansion: it limits growth (≤15% relative size increase, at most two added proteins per module) and requires gated evidence gain, so expansions remain conservative and reviewable.
8. Confidence labeling is hierarchical and evidence-based: core requires both topology and semantic support above thresholds; inner/outer require weaker support; uncertain captures boundary cases. The paper emphasizes these labels as triage metadata (especially core vs non-core), not as calibrated probabilities, and reports how label distributions can shift with preprocessing and embedding normalization.
9. Benchmarking on yeast (Gavin socioaffinity network; plus a Krogan 2006 BioGRID-derived transfer benchmark) shows ECHO-PPI largely preserves the predictive behavior of the MCL overlap seed rather than outperforming the strongest baseline (ClusterONE). On Gavin full-gold, ClusterONE leads (F1 0.270) while ECHO-PPI matches MCL-scale performance (F1 0.162) but uniquely delivers complete required-field evidence-bundle coverage (1.00 vs 0.00 for baselines).
10. The paper’s central claim is therefore complementary to pure F1 optimization: ECHO-PPI makes overlapping module predictions inspectable, confidence-aware, and reproducible. Case studies (e.g., YKR018C, YIL161W) illustrate multi-membership outputs where semantic evidence can dominate topology in some assignments, explicitly signaling “hypotheses for manual review” rather than silently promoting all overlaps to equally strong complex memberships.
💻Code:
github.com/MehrdadJalali-AI/…
📜Paper:
arxiv.org/abs/2605.21216
#computationalbiology #bioinformatics #PPI #proteininteractions #networkscience #communitydetection #interpretableAI #trustworthyAI #GeneOntology #reproducibility