Research Scientist, MIT CSAIL, formerly @Michigan. Cofounder of Nutch, Hadoop, Lattice Data, but the bad ideas are mine alone.

Joined July 2009
2 Photos and videos
Michael Cafarella retweeted
The Great American AI Race. I wrote something about how we need a holistic AI effort from academia, industry, and the US government to have the best shot at a freer, better educated, and healthier world in AI. I’m a mega bull on the US and open source AI. Maybe we’re cooking something bigger… stay tuned or contact us.
1
76
78
28,741
A very cool, interesting new paper on how to orchestrate and optimize data processing tasks with LLMs
18 Jul 2024
Super excited to share our work on LOTUS, a query engine for reasoning over large corpuses of data with LLMs! Joint work w/ the amazing @sid_jha1, @matei_zaharia & @guestrin Read the paper: arxiv.org/abs/2407.11418 Try out the code: github.com/stanford-futureda… 🧵👇
2
455
This project has been a ton of fun to work on. If you think it can solve a problem that's important to you, please get in touch!
Replying to @RussoMatthew
Finally, this work is the product of a great research team: Chunwei Liu (@Tranway), Michael Cafarella (@MikeCafarella), Lei Cao, Peter Chen, Zui Chen (@ZuiChen), Michael Franklin (@franklinmj), Tim Kraska (@tim_kraska), Sam Madden (@samrmadden), Gerardo Vitagliano (@gerarvita)
5
589
Check out the paper, code on GitHub, and colab demo
Replying to @RussoMatthew
If this (very high-level) summary of our work has piqued your interest -- go read our full paper! 📄Paper: arxiv.org/pdf/2405.14696 💻Code: github.com/mitdbg/palimpzest… We would love to hear any feedback, ideas for more use cases, and/or opportunities for collaboration.
4
361
Michael Cafarella retweeted
We’re excited to announce a pre-print and prototype system for Palimpzest: A Declarative System for Optimizing AI Workloads. Check out our full paper, blog post, and demo: - 📄Paper: arxiv.org/pdf/2405.14696 - 📬Blog Post: dsg.csail.mit.edu/projects/p… - 💻Demo: bit.ly/pz-demo
2
3
7
633
Michael Cafarella retweeted
7 Feb 2021
I’ve been intrigued by the world of tools that automate what typically takes data practitioners days to weeks of error-prone/not-so-repeatable work. I call this space autodata, and wrote a blog post about it: blog.marcua.net/2021/02/07/a…

5
19
71
I love Election Day.
4
Michael Cafarella retweeted
A photo is circulating online of Trump holding up the New York Post with Hunter Biden on the cover. A White House spokesman told CNN the photo is real. The alleged emails underpinning the NY Post stories may be part of a Russian disinformation campaign. cnn.com/2020/10/16/politics/…
809
810
2,074
This is a shame
“Sadly, our company has not survived the coronavirus pandemic." Espresso Royale Coffee permanently closes its doors after 33 years. michigandaily.com/section/bu…
Michael Cafarella retweeted
Jerry Lawson (1940-2011), the developer of the first interchangeable cartridge video game console with a microprocessor, the Fairchild Channel F (1976). This console predated the Atari 2600 by one year!
18
1,713
4,691
Congrats! Very well deserved
1
2
Michael Cafarella retweeted
Very proud that my student @jmfaleiro won this year's ACM SIGMOD Jim Gray award for the top PhD dissertation in the area of data systems. sigmod.org/sigmod-awards/cit… Jose's thesis focused on scalable, multi-core transaction processing.

9
9
85
Some of the cruelest things I’ve ever seen in print were in a female colleague’s course evaluation.
Are you even a woman in academia if your course evaluations don’t give you feedback about your physical appearance?
1
Goodness. I ran the same trends query again. First image is from 11pm eastern on Thursday night. Second image is from 240pm eastern on Friday.
1
This seems concerning
1
2
3
Just saw I'm quoted in Wired article about misuse of open source: wired.com/story/open-source-… Article suggests I think US is somehow non-democratic. My view is extremely the opposite. Biggest problem for open source are non-democratic, non-rules-based govts.
1
2
That said, overall the article is very good.
1
Michael Cafarella retweeted
This season, Computer Science and Engineering at Michigan has multiple tenure-track positions in all areas, including theory/crypto/security. Please apply and spread the word! cse.umich.edu/cse/jobs/

17
28
Bravo
If you’re having database problems I feel bad for you son, they have 99 problems but (almost) everything in this blog post ain’t one. codeburst.io/what-im-telling…
1
3
Michael Cafarella retweeted
2 Aug 2018
Cisco to buy cyber-security company Duo for $2.35 billion reut.rs/2vbrwRH
28
47