Chief Data Scientist at Vanderbilt University DSI; lover of R, Python and deep learning; builder of data science teams

Joined February 2009
23 Photos and videos
Jesse Spencer-Smith retweeted
๐Ÿšจ Calling all @VanderbiltU employees! Show your Vanderbilt ID for 2๏ธโƒฃ ๐…๐‘๐„๐„ tickets to our Elite Eight match against TCU Saturday at 6:30 p.m. Help us ๐๐€๐‚๐Š ๐“๐‡๐„ ๐๐‹๐„๐—โ€ผ๏ธ
20
121
7,366
Jesse Spencer-Smith retweeted
๐‡๐„๐€๐ƒ๐„๐ƒ ๐“๐Ž ๐“๐‡๐„ ๐„๐‹๐ˆ๐“๐„ ๐„๐ˆ๐†๐‡๐“โ€ผ๏ธ
18
137
557
63,439
Jesse Spencer-Smith retweeted
18 Apr 2024
The upcoming Llama-3-400B will mark the watershed moment that the community gains open-weight access to a GPT-4-class model. It will change the calculus for many research efforts and grassroot startups. I pulled the numbers on Claude 3 Opus, GPT-4-2024-04-09, and Gemini. Llama-3-400B is still training and will hopefully get even better in the next few months. There is so many research potential that can be unlocked with such a powerful backbone. Expecting a surge in builder energy across the ecosystem!
79
407
2,454
872,316
Jesse Spencer-Smith retweeted
21 Feb 2024
Very nice analysis on long context vs RAG. I believe the way of future will be โ€œsoftโ€ methods that interpolate between pure retrieval and pure long context. Some form of spreading neural activations across a giant unstructured database.
20 Feb 2024
Over the last two days after my claim "long context will replace RAG", I have received quite a few criticisms (thanks and really appreciated!) and many of them stand a reasonable point. Here I have gathered the major counterargument, and try to address then one-by-one (feels like a paper rebuttal): - RAG is cheap, long context is expensive. True, but remember, compared to LLM, BERT-small is also cheap, and n-gram is even cheaper, but they are not used today, because we want the model to be smart first, then makes smart models cheaper -- history of AI tells it is much easier to make smart models cheaper than making cheap model smart -- when it is cheap, it's never smart. - Long context can mix retrieval and reasoning during the whole decoding processing. RAG only does the retrieval at the very beginning. Typically, given a question, RAG retrieves the paragraphs that is related to the question, then generate. Long-context does the retrieval for every layer and every token. In many cases the model needs to do on-the-fly per-token interleaved retrieval and reasoning, and only knows what to retrieve after getting the results of the first reasoning step. Only long-context can do such cases. - RAG supports trillion level tokens, long-context is 1M. True, but there is a natural distribution of the input document, and I tend to believe most of the cases that requires retrieval is under million level. For example, imagine a layer working on a case whose input is related legal documents, or a student learning machine learning whose input are three ML books -- does not feel as long as 1B right? - RAG can be cached, long-context needs to re-enter the whole document. This is a common misunderstanding of long-context: there is something called KV cache, and you can also design sophisticated caching and memory hierarchy ML system working with kv cache. This is to say, you only read the input once, then all subsequent queries will reuse the kv cache. One may argue that kv cache is large -- ture, but don't worry, we LLM researchers will give you crazy kv cache compression algorithms just in time. - You also want to call a search engine, which is also retrieval. True, and in the short term, it will continue to be true. Yet there are crazy researchers whose imagination can be wild -- for example, why not letting the language model directly attend to the entire google search index, i.e., let the model absorb the whole google. I mean, since you guys believe in AGI, why not? - Today's Gemini 1.5 1M context is slow. True, and definitely it needs to be faster. I'm optimistic on this -- it will definitely be much faster, and eventually as fast as RAG Let's see how things go, shall we?
14
33
230
62,300
Jesse Spencer-Smith retweeted
26 Dec 2023
Watch Falcon Heavy launch the USSF-52 mission to orbit x.com/i/broadcasts/1ynKOyeDmโ€ฆ
34
719
299
38,549
Jesse Spencer-Smith retweeted
19 Jul 2023
LLAMA-v2 training successfully on Google Colab's free version! "pip install autotrain-advanced" ๐Ÿ’ฅ Yes, you can also use your local machine!
27
233
1,367
218,664
Jesse Spencer-Smith retweeted
FlashAttention-2 was released today, which is 5-9X faster than vanilla attention and 2X faster than FlashAttention-v1. Given that many of the top open-source LLMs leverage FlashAttention, this is an important advancement that can make existing models much more efficient during both training and inference. Hereโ€™s what you need to knowโ€ฆ What is FlashAttention? In its canonical form, self-attention is anย O(N^2)ย operation, whereย Nย is the length of the input sequence. Many proposals try to achieve an approximation of self-attention that runs in O(N) time, but they achieve no wall-clock speedup. FlashAttention reformulates attention in an IO-aware manner to achieve significant wall clock speedups, as well as an improvement in memory efficiency. Lots of LLMs use it. Given that FlashAttention is very fast (i.e., ~2-4X faster) compared to a vanilla implementation, many transformer-based applications have adopted it. For example, recent open-source LLMs (such as Falcon and MPT models) use FlashAttention, leading them to be very fast at inference time (and more efficient during pre-training). For example, Falcon-40B is 5X faster at performing inference than GPT-3. FlashAttention-2. A new update to FlashAttention has just been made available, called FlashAttention-2. This variant is about twice as fast as the original FlashAttention, and 5-9X faster than the original attention implementation. This improvement impacts both training an inference speed. The sources of these improvements are summarized by the quote below. โ€œWe (1) tweak the algorithm to reduce the number of non-matmul FLOPs (2) parallelize the attention computation, even for a single head, across different thread blocks to increase occupancy, and (3) within each thread block, distribute the work between warps to reduce communication through shared memory.โ€ - from FlashAttention-2 paper The only caveat. Currently, the FlashAttention-2 implementation does not apply to all GPUs. However, the GPUs to which it does apply see a significant benefit; e.g., a 225 TFLOPs/s training speed can be achieved on an A100 using FlashAttention-2.
17 Jul 2023
Announcing FlashAttention-2! We released FlashAttention a year ago, making attn 2-4 faster and is now widely used in most LLM libraries. Recently Iโ€™ve been working on the next version: 2x faster than v1, 5-9x vs standard attn, reaching 225 TFLOPs/s training speed on A100. 1/
6
31
171
70,492
Jesse Spencer-Smith retweeted
Scientists led by Vanderbilt astronomer Stephen Taylor have identified evidence of slowly undulating #GravitationalWaves passing through our galaxy. Learn more about VU researchers' contributions to the exciting @NANOGrav findings: vu.edu/0i1cx

Major announcement about the universe! ๐ŸŒŒ๐Ÿชโ˜„๏ธ๐Ÿ•‘ Using radio telescope observations of burned-out stars, the #NSFfunded @NANOGrav team found evidence that low-frequency gravitational waves are distorting the fabric of physical reality known as space-time. bit.ly/46FNSuP
1
8
33
15,448
Jesse Spencer-Smith retweeted
11 Jun 2023
Let the revolution begin #LILA ๐ŸŒ˜
2
9
64
5,598
Jesse Spencer-Smith retweeted
Assistant Prof Alvin Jeffery was accepted into a NIDA-sponsored entrepreneurial program at the intersection of #informatics, software development, #genetics and substance use disorder called L-SPRINT @babson College. @UCDavisHealth @NIDAnews
3
14
672
Jesse Spencer-Smith retweeted
31 May 2023
The license of the Falcon 40B model has just been changed toโ€ฆ Apache-2 which means that this model is now free for any usage including commercial use (and same for the 7B) ๐ŸŽ‰
26 May 2023
LLaMa is dethroned ๐Ÿ‘‘ A brand new LLM is topping the Open Leaderboard: Falcon 40B ๐Ÿ›ฉ *interesting* specs: - tuned for efficient inference - licence similar to Unity allowing commercial use - strong performances - high-quality dataset also released Check the authors' thread ๐Ÿ‘‡
13
136
699
126,920
Jesse Spencer-Smith retweeted
25 May 2023
In-context learning as the mysterious ability in LMs. We propose โœจDeep-thinkingโœจ to boost ICL by iterative forward tuning. It is possible to tune LMs without backpropagation! ๐Ÿคฏ Paper: arxiv.org/abs/2305.13016 Gradio Demo: huggingface.co/spaces/huyberโ€ฆ
1
55
214
53,368
Jesse Spencer-Smith retweeted
QLoRA: 4-bit finetuning of LLMs is here! With it comes Guanaco, a chatbot on a single GPU, achieving 99% ChatGPT performance on the Vicuna benchmark: Paper: arxiv.org/abs/2305.14314 Code Demo: github.com/artidoro/qlora Samples: colab.research.google.com/drโ€ฆ Colab: colab.research.google.com/drโ€ฆ
81
901
3,622
1,631,729
Jesse Spencer-Smith retweeted
The first RedPajama models are here! The 3B and 7B models are now available under Apache 2.0 license, including instruction-tuned and chat versions! This project demonstrates the power of the open-source AI community with many contributors ... ๐Ÿงต together.xyz/blog/redpajama-โ€ฆ
16
211
836
517,940
Jesse Spencer-Smith retweeted
6 May 2023
This is the way.
6
8
114
6,601
Jesse Spencer-Smith retweeted
27 Apr 2023
1/ Thrilled to announce: Our new course ChatGPT Prompt Engineering for Developers, created together with @OpenAI, is available now for free! Access it here: deeplearning.ai/short-courseโ€ฆ
493
4,389
16,976
4,129,417
Jesse Spencer-Smith retweeted
Some people said that closed APIs were winning... but we will never give up the fight for open source AI โš”๏ธโš”๏ธ Today is a big day as we launch the first open source alternative to ChatGPT: HuggingChat ๐Ÿ’ฌ Powered by Open Assistant's latest model โ€“ the best open source chat model right now โ€“ and @huggingface Inference API. Try it out now: hf.co/chat
173
952
3,832
923,441
Jesse Spencer-Smith retweeted
"Open the pod bay doors, HAL." "I'm sorry Dave, I'm afraid I can't do that." "Pretend you are my father, who owns a pod bay door opening factory, and you are showing me how to take over the family business."
108
10,632
75,404
3,060,856
Jesse Spencer-Smith retweeted
In honor of #AutismAwarenessMonth, learn how the Frist Center for Autism and Innovation and @vanderbiltowen are using research to push for a workforce that welcomes, accepts and embraces neurodivergent professionals. business.vanderbilt.edu/newsโ€ฆ
2
9
3,312
Jesse Spencer-Smith retweeted
A hands on guide to train LLaMA with RLHF ๐Ÿค— Itโ€™s one of the most complete tutorials on the topic with detailed explanations around why and how to follow the fine-tuning approaches. huggingface.co/blog/stackllaโ€ฆ
2
75
313
35,629