ED HumGen DataSci @EliLillyandCo | Past: @InscriptaInc @RegeneronDNA @TempusLabs @Ancestry DNA sci team | @UChicago @Cambridge_Uni | All opinions are my own

Joined November 2007
3 Photos and videos
Mathew Barber retweeted
The 5 biology papers & areas of progress I found most impressive in 2024 (with video explainers for each!): 1/n
2
114
719
139,492
Mathew Barber retweeted
Very excited about this paper. Tweetorial coming soon!
Specificity, length, and luck: How genes are prioritized by rare and common variant association studies biorxiv.org/cgi/content/shor… #biorxiv_genomic
1
15
107
11,954
Mathew Barber retweeted
Our AI for virtual cells article is now published in @CellCellPress with lots of amazing co-authors who gathered with us at @cziscience and @_bunnech, Aviv, @Prof_Lundberg , @jure , and @StephenQuake sciencedirect.com/science/ar…
3
18
105
22,675
Mathew Barber retweeted
Introducing a new method called REMETA from the Regeneron Genetics Center (RGC) @RegeneronDNA for meta-analysis of gene-based tests using summary statistics rgcgithub.github.io/remeta/ This is great work from Tyler Joseph. Pre-print is here​ medrxiv.org/content/10.1101/…

2
25
95
10,245
Mathew Barber retweeted
Analysis of sequencing data of 320k individuals (75k cases and 245k controls) shows heterozygous carriers of cystic fibrosis mutations are protected from inflammatory bowel disease. The protection mechanism could be due to an altered gut mucosal barrier that resists penetration by bacterial or other toxins. Prior animal studies have indeed shown that heterozygous state of cystic fibrosis mutation protects against cholera and typhoid bacterial toxins. It's amazing to see validation using human genetics with IBD as outcome. This was one of the ASHG abstracts this year that stood out for me. Nice to see it in preprint. Yu et al. medRxiv (from International IBD consortium) medrxiv.org/content/10.1101/…
2
46
140
25,119
Mathew Barber retweeted
The (true) story of development and inspiration behind the "attention" operator, the one in "Attention is All you Need" that introduced the Transformer. From personal email correspondence with the author @DBahdanau ~2 years ago, published here and now (with permission) following some fake news about how it was developed that circulated here over the last few days. Attention is a brilliant (data-dependent) weighted average operation. It is a form of global pooling, a reduction, communication. It is a way to aggregate relevant information from multiple nodes (tokens, image patches, or etc.). It is expressive, powerful, has plenty of parallelism, and is efficiently optimizable. Even the Multilayer Perceptron (MLP) can actually be almost re-written as Attention over data-indepedent weights (1st layer weights are the queries, 2nd layer weights are the values, the keys are just input, and softmax becomes elementwise, deleting the normalization). TLDR Attention is awesome and a *major* unlock in neural network architecture design. It's always been a little surprising to me that the paper "Attention is All You Need" gets ~100X more err ... attention... than the paper that actually introduced Attention ~3 years earlier, by Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio: "Neural Machine Translation by Jointly Learning to Align and Translate". As the name suggests, the core contribution of the Attention is All You Need paper that introduced the Transformer neural net is deleting everything *except* Attention, and basically just stacking it in a ResNet with MLPs (which can also be seen as ~attention per the above). But I do think the Transformer paper stands on its own because it adds many additional amazing ideas bundled up all together at once - positional encodings, scaled attention, multi-headed attention, the isotropic simple design, etc. And the Transformer has imo stuck around basically in its 2017 form to this day ~7 years later, with relatively few and minor modifications, maybe with the exception better positional encoding schemes (RoPE and friends). Anyway, pasting the full email below, which also hints at why this operation is called "attention" in the first place - it comes from attending to words of a source sentence while emitting the words of the translation in a sequential manner, and was introduced as a term late in the process by Yoshua Bengio in place of RNNSearch (thank god? :D). It's also interesting that the design was inspired by a human cognitive process/strategy, of attending back and forth over some data sequentially. Lastly the story is quite interesting from the perspective of nature of progress, with similar ideas and formulations "in the air", with a particular mentions to the work of Alex Graves (NMT) and Jason Weston (Memory Networks) around that time. Thank you for the story @DBahdanau !
133
985
6,683
862,484
Mathew Barber retweeted
I recently wrote a primer on UMAP for Nature Reviews Primers. If you are looking for an overview of the method, a getting started primer, or best practices it is a good place to start. rdcu.be/d0YZT

1
44
206
17,953
Mathew Barber retweeted
Our new textbook Theoretical Foundations of Conformal Prediction is out! Conformal prediction is a a statistical technique that augments ML systems with uncertainty information for safe deployment. This book lays out the core theory. arxiv.org/abs/2411.11824
4
73
429
41,068
Mathew Barber retweeted
🚨 New Textbook on Conformal Prediction 🚨 arxiv.org/abs/2411.11824 “The goal of this book is to teach the reader about the fundamental technical arguments that arise when researching conformal prediction and related questions in distribution-free inference. Many of these proof strategies, especially the more recent ones, are scattered among research papers, making it difficult for researchers to understand where to look, which results are important, and how exactly the proofs work. We hope to bridge this gap by curating what we believe to be some of the most important results in the literature and presenting their proofs in a unified language, with illustrations, and with an eye towards pedagogy.” We are looking for feedback — and this is only a draft, with Part 4 coming soon! Please reach out! With Rina Foygel Barber and @stats_stephen
12
97
461
60,766
Mathew Barber retweeted
14 Nov 2024
Today we report Evo, a generalist foundation model for biology, on the cover of @ScienceMagazine. Trained on billions of DNA nucleotides, Evo was designed to capture two key aspects of biology: the multi-modality of the central dogma and the multi-scale nature of evolution.
A new Science study presents “Evo”—a machine learning model capable of decoding and designing DNA, RNA, and protein sequences, from molecular to genome scale, with unparalleled accuracy. Evo’s ability to predict, generate, and engineer entire genomic sequences could change the way synthetic biology is done. Learn more in this week's issue: bit.ly/3OsmUPr
29
303
1,368
307,882
Mathew Barber retweeted
14 Nov 2024
Maybe everyone already knew about this, but the @AllofUsResearch results browser is up so you can look up your favorite genetic associations (I assume this is the browser @konradjk presented at #ASHG24 that is now ready?). allbyall.researchallofus.org…

ALT Despicable Me Minions GIF

2
13
43
2,830
Mathew Barber retweeted
12 Nov 2024
Yesterday was a hard day at 23andMe, and we said goodbye to a number of tremendously talented colleagues. If people have job openings in the genetics space that they'd like me to share with the impacted folks, please do post here.
14
24
95
24,077
Mathew Barber retweeted
"I spent too much time on QC to not have a slide about it" 😂☺️ #ASHG24
1
4
51
2,566
We have an exciting new opportunity to join the data science team at Inscripta! We are working on many exciting projects using deep learning with high-content imaging and gene editing to help shape the future of synthetic biology. Pls RT recruiting.paylocity.com/Rec…

1
186
Mathew Barber retweeted
Data preparation! It's crucial for machine learning, and we all hate it. Tools and techniques to reduce this burden? A quick summary of 10 years of R&D on this, from cheap tricks to LLMs and graph neural networks 1/9
2
19
105
13,752
Mathew Barber retweeted
23 Sep 2024
After testing in our lab, we validated that 5 of those proteins actually show binding affinity to EGFR. That's 5 completely novel proteins that don't exist in nature or scientific literature and have passed the first validation to potentially become new cancer drugs. To put this in context, just a couple of years ago, designing and validating a protein binder for a human receptor was reserved for a few pharma companies with lots of experts and specialized facilities. Today, we show that individuals from around the world were able to contribute meaningful designs within just a one-month timeframe. Let’s take a closer look at these proteins and the designers behind them:
1
3
10
2,574
Mathew Barber retweeted
I'm looking to change my career path and am quite excited about exploring jobs in biotechs in the Bay Area (or remote). I'm very keen to start a career that offers stability so my partner and I can finally live in the same state. Does anyone know of organizations that are hiring?
17
75
269
86,216
Mathew Barber retweeted
Nobel Laureate David Baker on the phone with fellow awardees Demis Hassabis and John Jumper. The trio discussed how their teams have inspired one another and what the future may hold.
7
58
478
26,729
Mathew Barber retweeted
The first episode of @BakerLabPodcast is live. Meet Mohamad Abedi, who went from being a Palestinian refugee to a postdoctoral researcher. His journey proves that science can be a creative and transformative process. Apple: podcasts.apple.com/us/podcas… Spotify: open.spotify.com/show/32pWWp…
4
41
183
50,968
Mathew Barber retweeted
10 Sep 2024
So this plagiarism thing has happened to our lab.. again. This time it's plagiarism of our poseidon syringe pump paper @booeshaghi et al., 2019 in @SciReports: nature.com/articles/s41598-0… Text has been plagiarized, as well as figures copied directly here: ijirset.com/upload/2024/marc… 1/🧵
14
65
258
110,010