Science Lead @Theteamatx, former fellow @HiTSatHarvard. Co-developer @IndraSysBio, @PySysBio. PhD Systems Biology.

Joined January 2014
112 Photos and videos
John Bachman retweeted
Replying to @benjamingyori
@benjamingyori and I comment on a new frontier of AI-assisted interpretation of proteomic investigations nature.com/articles/s41592-0…

3
6
444
John Bachman retweeted
Correlation is not causation. In our new perspective, we connect systems biology, causal reasoning, and machine learning to inform future approaches in systems biology and molecular medicine in the wake of current deep learning advances: doi.org/10.1038/s44320-024-0… 🧵👇
1
72
269
32,727
John Bachman retweeted
🌟Announcing "Lineax" - our newest #JAX library! For fast linear solves and least squares. GitHub: github.com/google/lineax * Fast compile time * Fast runtime * Efficient new algorithms (e.g. QR) existing ones (GMRES, LU, SVD, ...) * Support for general linear operators🔥 1/
6
39
312
50,946
John Bachman retweeted
(1/n) some thoughts about foundation models for single-cell biology upon publication of an interesting paper (geneformer) today in @Nature introducing a foundation model trained on 30M cells with cool applications nature.com/articles/s41586-0…

2
67
255
70,713
John Bachman retweeted
Fascinating new preprint on bioRxiv tackles a whale of a question: Whales are huge. So why don’t they get a ton of cancers? biorxiv.org/content/10.1101/…
72
945
3,732
990,570
Crick to Watson: “I must also point out to you, once again, the risks you will run if you publish such a book. The picture which emerges of yourself is not only unfavourable but misleadingly so. Moreover I do not think you realize what others will see in it.”
Re: the history of the DNA structure, see this blistering 1967 letter from Crick to Watson objecting to the draft of Watson’s book on their work. “...it shows such a naive and egotistical view of the subject as to be scarcely credible.” profiles.nlm.nih.gov/spotlig…
3
5
1,029
John Bachman retweeted
How can we learn to rapidly compose new genetic circuits? In a new essay, I explore recent work from the frontier of ML-driven biodesign 🧬 We're expanding from models of genetic parts, to models of genetic circuits:
2
20
115
35,144
John Bachman retweeted
Thought experiment: If we sequenced every butterfly species on earth, would we have enough data for a generative ML model for butterfly genomes? What if we sampled 100 individual genomes from every species? What if we sequenced all the other insects too? What would it take?
13
9
82
17,890
John Bachman retweeted
Machine learning competitions are often a good indicator of what techniques actually work well in practice on new datasets. The very comprehensive State of Competitive Machine Learning 2022 report just came out and contained many interesting and surprising insights! 1) As expected, transformers dominate natural language processing (NLP). ALL NLP-related winning solutions used transformers. 2) Convolutional neural networks still dominate computer vision. And EfficientNet is the most popular pretrained architecture for computer vision -- most people finetune pretrained models rather than training from scratch. 3) Almost twice as many winning solutions used k-fold CV instead of a fixed validation set. 4) Kaggle (barely) remains the most popular competition platform. 5) Almost everyone uses Python. 6) Out of 46 winning solutions using deep learning, 44 used PyTorch, and only 2 used TensorFlow. 7) A big surprise for tabular competitions: the reign of XGBoost seems over. While gradient boosting still wins most tabular competitions, LightGBM is now the preferred approach, with CatBoost coming in second. XGBoost is third. 8) Winning solutions of 7 out of the 10 tabular competitions used gradient boosting, 5 out of 10 used deep neural networks (implemented in PyTorch), and most winning solutions were ensemble methods. Here's a link to the full report: mlcontests.com/state-of-comp…
24
329
1,556
305,822
John Bachman retweeted
As a casual reminder to reviewers and authors: if you are working on a biology task and you use random cross-validation, you are making a mistake. It's truly disheartening to review a paper and see this because you have no idea just how distorted the results are.
5
34
173
69,287
John Bachman retweeted
BioGPT-Large was just released by Microsoft 🤩 Trained from scratch on biomedical text, it's the current leader on the PubMedQA benchmark at 81% accuracy (human performance = 78%). It's also freely available on the @huggingface hub to try out (and fine-tune)!
45
778
4,241
739,076
John Bachman retweeted
Wise lessons from Fermi’s way of doing physics.
15
54
312
43,583
John Bachman retweeted
M is for Mineral. Congratulations project Mineral on graduating from X to become Alphabet’s newest company: mineral.ai/blog/m-is-for-min…
10
35
6,484
John Bachman retweeted
You never forget the first time you learn universities’ overhead rates on external grants.
137
260
4,365
1,251,622
John Bachman retweeted
Using machine-assembled networks for keeping human-curated resources up to date is a promising direction for many structured resources. These networks can also be the basis of causal/mechanistic analysis of high-throughput data.
Excited to officially release the INDRA-assembled networks of GO-term gene sets and automated INDRA-driven extensions to @NCIsysbio's NCI-PID pathways via @NDExProject. Looking forward to continuing this awesome collaboration with @NDExProject!
1
3
13
Very cool approach that uses DepMap drug sensitivity, CRISPR, gene expression and mutation data to characterize the target spectra of many drugs.
Drug target identification is at the heart of drug development, and we’ve been working to change how it’s been done. We present DeepTarget: a new computational tool to characterize a drug’s mechanism of action in-depth beyond its primary target. 🧰🧵👇 biorxiv.org/content/10.1101/…
3
💯
The first AI-powered drug platforms arrived on the scene a decade ago promising “faster, better, cheaper” — and yet none of them has gotten a novel chemical entity hitting a novel target across the approval finish line.
John Bachman retweeted
This alone is a big deal for text mining and knowledge aggregation: if repositories have useful APIs and provide full, machine readable text (i.e. not PDF), it can unlock substantial value.
Replying to @petersuber
6/ The peer-reviewed publications must be deposited in "agency-designated" OA repositories and in "formats that allow for machine-readability and enabling broad accessibility through assistive devices."
1
5
13
John Bachman retweeted
🎉 thrilled to hear this!
BREAKING: White House issues new policy that will require, by 2026, all federally-funded research results to be freely available to public without delay, ending longstanding ability of journals to paywall results for up to 1 year. Coverage coming on @ScienceInsider.
9
39
John Bachman retweeted
30 Jul 2022
Integrating knowledge and omics to decipher mechanisms via large-scale models of signaling networks | Molecular Systems Biology embopress.org/doi/full/10.15… #Bioinformatics
1
30
84