In their Specious Art paper, Chari & @lpachter claim that tSNE/UMAP are as arbitrary as a random elephant shape. But are they?
We show in our comment that this is false and throws the tSNE/UMAP baby out with the bathwater!
Details in 🧵& paper:
biorxiv.org/content/10.1101/…
1/8
It's time to stop making t-SNE & UMAP plots. In a new preprint w/ Tara Chari we show that while they display some correlation with the underlying high-dimension data, they don't preserve local or global structure & are misleading. They're also arbitrary.🧵biorxiv.org/content/10.1101/…
PS: If anyone needs additional reasons to leave this place, here is one: Musk endorsed the German right-nationalist AfD — a party that spreads hate against migrants and constantly lies about climate change.
Unacceptable!
I won't be reading, posting or replying here in the foreseeable future. Find me (and many other science & sci-comm people) on the other site, where skies are still blue! 🦋
I won't be reading, posting or replying here in the foreseeable future. Find me (and many other science & sci-comm people) on the other site, where skies are still blue! 🦋
I am very skeptical of the AI/ML/computational methods to predict protein-protein interactions. Prove me wrong.
I present a challenge for anyone who claims they can predict protein-protein interactions.
1/n
Another example of what I @b_mittelstadt@c_russl termed careless speech. Subtle hallucinations are dangerous & developers are not (yet) liable for them. We argue they should. See paper: Do LLMs have a legal duty to tell the truth? tinyurl.com/435jba5ptinyurl.com/4wpske2s
We updated our ICLR dataset (see github.com/berenslab/iclr-da…) with blind 2025 submissions to @iclr_conf. Over 10k submissions this year.
I like how this embedding (SBERT tSNE) shows which ML areas are old-school and which ones are currently booming.
Over the past few months, our Interpretability team has put out a number of smaller research updates. Here’s a thread of some of the things we've been up to:
The vocabulary of systems neuroscience may appear daunting to many. Here's a short dictionary of common terms. BTW if you use them in your papers and grants you will have greater success
Warum lügt die @CSU?
Kurz gesagt: Um von echten Problemen abzulenken, also um euch zu verarschen.
Also kuschelt eure Haustiere, widersprecht so einem Mist und lasst uns die Probleme lösen, die euch im Alltag stören und nicht welche, die die Union erfunden hat 🐶😽❤️
🤖🧠NOW OUT IN PNAS🧠🤖
Language models show many surprising behaviors. E.g., they can count 30 items more easily than 29
In Embers of Autoregression, we explain such effects by analyzing what LMs are trained to do
pnas.org/doi/10.1073/pnas.23…
Major updates since the preprint!
1/n
ALT At the top is the title of the paper: "Embers of autoregression show how large language models are shaped by the problem they are trained to solve". Below on the left is a screenshot of ChatGPT being asked to count how many words are in a list. The correct answer is 29, but it says 30. Next to it is a plot showing ChatGPT's accuracy at counting elements in a list; in general, it does well on multiples of 10 but poorly on other numbers. The explanation offered at the bottom of the image is: In training sets, round numbers are much more common than other numbers.
Datamapplot 0.4 is out now, and has far more powerful and effective interactive plots.
Here is an example of a Data Map of 2.4 million papers on ArXiv, ready to be explored.
BREAKING NEWS
The Royal Swedish Academy of Sciences has decided to award the 2024 #NobelPrize in Physics to John J. Hopfield and Geoffrey E. Hinton “for foundational discoveries and inventions that enable machine learning with artificial neural networks.”
The strange state of current publishing. I think some stats on our recent Cellstates paper provide interesting food for thought. The preprint was put on BioRxiv and within 3 months there were ~8500 abstract views and >2000 PDF downloads.
We also submitted it to PCB 6-11-2023 1/n
PSA:
If you don't want your X posts used to train Grok, you now have to explicitly opt-out.
Go to x.com/settings/grok_settings and uncheck the box.
If link doesn't work, go to Settings->Privacy and Safety->Grok
ICML 2024 (held in Vienna) registrations vs. registrations *per million inhabitants* by country.
The barplot of registrations on the log scale was shown yesterday during the opening. I took a photo, digitized with WebPlotDigitizer, and normalized per capita.
#ICML2024@icmlconf
"The Harris campaign has seen a massive influx of donations, often from people who haven't donated before. This is democracy working as intended, folks!"
Super happy to see donations, but "democracy as intended" is more about voting with your vote than your money, honestly.
Excited to be in Vienna at ICML to present my work with @samgreydanus on scaling _down_ deep learning with MNIST-1D!
We show how one can study serious deep learning with n=5000 and d=40.
Paper: openreview.net/forum?id=n9pr…
Code: github.com/greydanus/mnist1d…
Find me at poster session 1.
Few evolutionary psychologists engage with the modern genomic literature. Brendan Zietsch is an exception and he has a new paper out, criticising explanations for heritable variation based on balancing selection. He's pulling no punches.