Filter
Exclude
Time range
-
Near
BIOBLOBS: Differentiable Graph Partitioning for Protein Representation Learning 1. This paper introduces BIOBLOBS, a novel module for protein representation learning that dynamically partitions proteins into flexible, non-overlapping substructures, termed “blobs”. This approach captures the modular nature of protein function more effectively than traditional methods relying on rigid substructures. 2. BIOBLOBS is fully differentiable and can be integrated into existing protein encoders like GVP-GNN. It uses a neural partitioning layer to create these blobs, which are then quantized into a shared codebook, resulting in discrete, interpretable representations of protein substructures that improve performance on various protein function prediction tasks. 3. The neural partitioner in BIOBLOBS employs a seed-and-expand strategy to form cohesive substructures. It selects seed residues using Gumbel-Softmax sampling and expands around them based on learned thresholds, ensuring that the resulting blobs are connected and functionally relevant. 4. A key innovation is the use of a vector-quantization codebook to map blob embeddings to discrete tokens. This not only captures frequent and functionally relevant substructures across the dataset but also enables the model to learn a compact and reusable vocabulary of protein motifs. 5. The global-blob attention fusion module integrates the quantized blob embeddings with the global protein representation. This allows the model to attend to informative substructures and produces an interpretable importance score distribution over blobs, enhancing the model’s interpretability. 6. Experiments across three protein function prediction benchmarks (Gene Ontology, Enzyme Commission, and Structural Class) demonstrate that BIOBLOBS outperforms strong baselines under both random and structure-based splits. The improvements are particularly significant on the structure split, which reduces similarity leakage. 7. The authors provide a detailed analysis of the model’s partitions, showing that BIOBLOBS consistently identifies well-defined secondary structures and assigns high importance scores to coherent, stable structures. The codebook further maps similar substructures to nearby codes, providing a meaningful vocabulary. 8. BIOBLOBS addresses the challenge of selecting substructures of variable size and topology by introducing a differentiable seed-and-expand procedure. This, coupled with discrete codebook learning, yields more faithful protein representations and a scalable account of structure-function relationships. 💻Code: github.com/OliverLaboratory/… 📜Paper: arxiv.org/abs/2510.01632 #ProteinRepresentationLearning #GraphPartitioning #NeuralPartitioner #ProteinFunctionPrediction #DeepLearning #ComputationalBiology
1
3
23
1,687
Efficient #RDF #KnowledgeGraph Partitioning Using Querying Workload: We present two #graphPartitioning techniques based on querying workloads leveraging predicate co-occurrences. Results: Advantages in query runtimes, timeout queries, distinct sources, overall rank score. [1/2]
1
4
5
Peter Mucha starting off his community detection workshop. #graphpartitioning #PolNet2019
1
6