Tatsunori Hashimoto

Tatsunori Hashimoto

13 Photos and videos

Tweets

Yingtong Dou retweeted

Tatsunori Hashimoto @tatsu_hashimoto

May 21

Some new results I found surprising that I’m tweeting for Chris (who isnt on here). With enough compute, the best data filter for LMs (on DCLM) might be no filter. Why? Large models can tolerate a surprising amount of nominally 'low quality' data, and can sometimes even benefit.

154

1,230

222,083

Drew Breunig

Yingtong Dou retweeted

Drew Breunig

@dbreunig

Feb 3

If you're doing applied AI research (especially system design, benchmarks, evals, efficiency, or ops) you should be submitting to the Conference on AI and Agentic Systems... caisconf.org/pages/cfp/

264

26,340

Yingtong Dou

Yingtong Dou @dozee_sim

14 Nov 2025

🧵(3/6) 3𝐃-𝐓𝐫𝐚𝐧𝐬𝐟𝐨𝐫𝐦𝐞𝐫 𝐀𝐫𝐜𝐡𝐢𝐭𝐞𝐜𝐭𝐮𝐫𝐞. We design a new architecture, specifically to model and fuse the complex, multi-modal nature of transaction data. This approach hierarchically encodes transaction features, individual transactions, and their sequences.

Yingtong Dou

Yingtong Dou @dozee_sim

14 Nov 2025

🧵(4/6) 𝐑𝐞𝐚𝐥-𝐰𝐨𝐫𝐥𝐝 𝐈𝐦𝐩𝐚𝐜𝐭. TGPT achieves a significant improvement over a production model on transaction classification. TGPT excels at generating realistic future transaction trajectories, opening up new avenues for forecasting and personalization.

Yingtong Dou

Yingtong Dou @dozee_sim

14 Nov 2025

Yingtong Dou

Yingtong Dou @dozee_sim

14 Nov 2025

Guohao Li 🐫

Yingtong Dou retweeted

Guohao Li 🐫

@guohao_li

29 Jul 2025

Introducing Eigent — the first multi-agent workforce on your desktop. Eigent is a team of AI agents collaborating to complete complex tasks in parallel. It is your long-term working partner with fullly customizable workers and MCPs. Public beta available to download for MacOS, Windows. 100% open-source on Github. Comment for 500 extra credits.

2:03

142

136

682

220,916

Michael Galkin

Yingtong Dou retweeted

Michael Galkin @michael_galkin

2 May 2025

📣 Our spicy ICML 2025 position paper: “Graph Learning Will Lose Relevance Due To Poor Benchmarks”. Graph learning is less trendy in the ML world than it was in 2020-2022. We believe the problem is in poor benchmarks that hold the field back - and suggest ways to fix it! 🧵1/10

294

84,298

FORTUNE

Yingtong Dou retweeted

FORTUNE

@FortuneMagazine

6 Feb 2025

Visa President of Technology Rajat Taneja rebuilt the company’s data platform from scratch, helping position it for the generative AI boom. trib.al/6XClvqz

How Visa places big bets on AI and gen AI to secure 639 million daily transactions | Fortune

Visa President of Technology Rajat Taneja rebuilt the company’s data platform from scratch, helping position it for the generative AI boom.

fortune.com

6,161

Yangqing Jia

Yingtong Dou retweeted

Yangqing Jia

@jiayq

26 Dec 2024

There is a lot of unconscious emphasis of the DeepSeek model being “Chinese” and implicit connection with the Sino-US relationship or the GPU power. In my eyes, the success of DeepSeek has little to do with that. It is simple intelligence and pragmatism at work: given a limit of computation and manpower present, produce the best outcome with smart research. Same with the AlexNet model when Alex Krizhevsky needed to make magic with 2 GPUs, and not a supercluster. There are a lot of super smart AI people and companies in the world. In terms of the Chinese ethnic group, people I had the privilege to have worked with include (but are not limited to) - Kaiming He who is the OG of modern computer vision. - Song Han who founded DeePhi, OmniML and now professor at MIT. - the DMLC folks who created early frameworks like MxNet and TVM. - Bing Xu who did MxNet, was coauthor of GAN, founded HippoML and is now at NVidia. - Orbeus, a startup on early CV applications and now the foundation of AWS ReKognition. And many more. They ace in the frontier of AI, whether it’s research, product, small startups, or big companies. AI should bring us closer rather than more separate. I was saddened by the discriminative comments given by Professor Rosalind Picard at NeurIPS, but was too busy to put my thoughts together and say something. Looking back at 2024, I think what really stood out is the fundamental seek for AI breakthrough - collect what we have, use our brain, and achieve our best. It’s like the Olympics: faster, higher, stronger, together.

528

67,834

Zhuang Liu

Yingtong Dou retweeted

Zhuang Liu

@liuzhuang1234

19 Dec 2024

How far is an LLM from not only understanding but also generating visually? Not very far! Introducing MetaMorph---a multimodal understanding and generation model. In MetaMorph, understanding and generation benefit each other. Very moderate generation data is needed to elicit visual generation from an LLM, when trained jointly with visual understanding.

133

718

253,438

Aurimas Griciūnas

Yingtong Dou retweeted

Aurimas Griciūnas

@Aurimas_Gr

22 Nov 2024

What is 𝗔𝗴𝗲𝗻𝘁𝗶𝗰 𝗥𝗔𝗚? In real world applications, simple naive RAG systems are rarely used nowadays. To provide correct answers to a user query, we are always adding some agency to the RAG system. However, it is important to 𝗻𝗼𝘁 𝗴𝗲𝘁 𝗹𝗼𝘀𝘁 𝗶𝗻 𝘁𝗵𝗲 𝗯𝘂𝘇𝘇 𝗮𝗻𝗱 𝘁𝗲𝗿𝗺𝗶𝗻𝗼𝗹𝗼𝗴𝘆 and understand that there is 𝗻𝗼 𝘀𝗶𝗻𝗴𝗹𝗲 𝗯𝗹𝘂𝗲𝗽𝗿𝗶𝗻𝘁 to add the mentioned agency to your RAG system and you should adapt to your use case. My advice is to not get stuck on terminology and think about engineering flows. Let’s explore some of the moving pieces in Agentic RAG: 𝟭. Analysis of the user query: we pass the original user query to a LLM based Agent for analysis. This is where: ➡️ The original query can be rewritten, sometimes multiple times to create either a single or multiple queries to be passed down the pipeline. ➡️ The agent decides if additional data sources are required to answer the query. 𝟮. If additional data is required, the Retrieval step is triggered. In Agentic RAG case, we could have a single or multiple agents responsible for figuring out what data sources should be tapped into, few examples: ➡️ Real time user data. This is a pretty cool concept as we might have some real time information like current location available for the user. ➡️ Internal documents that a user might be interested in. ➡️ Data available on the web. ➡️ … 𝟯. If there is no need for additional data, we try to compose the answer (or multiple answers) straight via an LLM. 𝟰. The answer (or answers) get analyzed, summarized and evaluated for correctness and relevance: ➡️ If the Agent decides that the answer is good enough, it gets returned to the user. ➡️ If the Agent decides that the answer needs improvement, we try to rewrite the usr query and repeat the generation loop. The real power of Agentic RAG lies in its ability to perform additional routing pre and post generation, handle multiple distinct data sources for retrieval if it is needed and recover from failures in generating correct answers. What are your thoughts on Agentic RAG? Let me know in the comments! 👇 #RAG #LLM #AI

223

1,064

196,756

Jianfeng Chi

Yingtong Dou retweeted

Jianfeng Chi @jianfengchi

8 Oct 2024

I am hiring a research intern, working LLM (Llama 3 ) safety. The internship is expected to start in Summer/Spring 2025, based in New York City. Please drop me an email at jianfengchi@meta.com (Subject starts with "[2025 Intern]") Learn more here: metacareers.com/jobs/1719537…

262

32,181

William Wang

Yingtong Dou retweeted

William Wang

@WilliamWangNLP

14 Sep 2024

Respectfully disagree. It's the structure of language and words that make LLMs effective. Pure speech, time series, or video without linguistic co-supervision don't yield the same results. Language provides the minimal conceptual units that enable these models to work.

Andrej Karpathy

@karpathy

14 Sep 2024

It's a bit sad and confusing that LLMs ("Large Language Models") have little to do with language; It's just historical. They are highly general purpose technology for statistical modeling of token streams. A better name would be Autoregressive Transformers or something. They don't care if the tokens happen to represent little text chunks. It could just as well be little image patches, audio chunks, action choices, molecules, or whatever. If you can reduce your problem to that of modeling token streams (for any arbitrary vocabulary of some set of discrete tokens), you can "throw an LLM at it". Actually, as the LLM stack becomes more and more mature, we may see a convergence of a large number of problems into this modeling paradigm. That is, the problem is fixed at that of "next token prediction" with an LLM, it's just the usage/meaning of the tokens that changes per domain. If that is the case, it's also possible that deep learning frameworks (e.g. PyTorch and friends) are way too general for what most problems want to look like over time. What's up with thousands of ops and layers that you can reconfigure arbitrarily if 80% of problems just want to use an LLM? I don't think this is true but I think it's half true.

153

31,394

Simran Arora

Yingtong Dou retweeted

Simran Arora

@simran_s_arora

9 Jul 2024

Excited to share Just read twice: going beyond causal language modeling to close quality gaps between efficient recurrent models and attention-based models!! There’s so much recent progress on recurrent architectures, which are dramatically more memory efficient and asymptotically faster than attention 💨 But there’s no free lunch 🥪 these models can’t fit all the information from long contexts into the limited memory, degrading in-context learning quality. Is all lost?

299

93,074

Yingtong Dou

Yingtong Dou @dozee_sim

7 Jul 2024

The review form of #NeurIPS 2024 is cumbersome!

541

Valeriy M., PhD, MBA, CQF

Yingtong Dou retweeted

Valeriy M., PhD, MBA, CQF

@predict_addict

30 Jun 2024

The paper we have been waiting for essentially shows that #timeseries #llms do not work in forecasting. Back in 2022, paper “Are Transformers Effective for Time Series Forecasting?“ challenged the appearing narrative that transformers are useful for forecasting. By removing transformer elements the authors showed the performance went up ⬆️ And now people did the same with time series LLMs. The papers demonstrated: - removing the LLM component or replacing it with a basic attention layer does not degrade the forecasting results—in most cases the results even improved! - in fact removing even removing the language model entirely, yields comparable or better performance! - these simpler methods after removal of LLM component reduce training and inference time by up to three orders of magnitude while maintaining comparable performance! - the sequence modeling capabilities of LLMs do not transfer to time series. By shuffling input time series the authors find no appreciable change in performance. What this says is that LLMs can’t deal with critical features of time series, the time order is key and if LLMs performance doesn’t change when shuffling data it basically means it doesn’t model time series. These finding are as damming to time series LLMs as the “Are Transformers Effective for Time Series Forecasting?” was for transformers. #timeseries #forecasting

181

856

120,077

Joe Barrow

Yingtong Dou retweeted

Joe Barrow

@barrowjoseph

6 Jun 2024

This is such a good paper. I love NLP error analysis papers -- @chrmanning and co do them so well (another great example will always be "Part-of-Speech Tagging from 97% to 100%").

ALT The first page of a paper from Stanford titled "Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools"

158

29,829