UC Berkeley Scholarly Comm & Information Policy

UC Berkeley Scholarly Comm & Information Policy

48 Photos and videos

Tweets

David Bamman retweeted

UC Berkeley Scholarly Comm & Information Policy @UCB_ScholComm

4 Nov 2024

From @dbamman (co-authored w/@rach_scholcomm) new paper today relying on #copyright exemption for decrypting DVDs to conduct #textdatamining. Using the exemption, authors built a collection of film to measure representation for gender and race/ethnicity pnas.org/doi/10.1073/pnas.24…

Measuring diversity in Hollywood through the large-scale computational analysis of film | PNAS

Movies are a massively popular and influential form of media, but their computational study at scale has largely been off-limits to researchers in ...

pnas.org

1,836

David Bamman

David Bamman @dbamman

15 Oct 2024

Lucy is a rock star and you should all hire her!

Lucy Li @lucy3_li

14 Oct 2024

Hi friends, colleagues, followers. I am on the faculty job market! I am a PhD student @BerkeleyISchool @berkeley_ai. I work on NLP, and I believe all language, whether AI- or human-generated, is ✨social and cultural data✨. My work includes: 🧵

4,135

Lucy Li

David Bamman retweeted

Lucy Li @lucy3_li

30 Sep 2024

How might one do classification in the era of LLMs for humanities research? 🤔 @dbamman, @KentKChang, @NaitianZhou & I apply LLMs on ten tasks from prior cultural analytics lit. Larger LMs are competitive w/ older methods on established tasks, but perform less well on new ones.

David Bamman @dbamman

30 Sep 2024

My group just finished up a new paper that I'm excited to get out into the world: "On Classification with Large Language Models in Cultural Analytics" (to be published at CHR): github.com/bamman-group/ca-c…. More info here! bsky.app/profile/dbamman.bsk…

1,873

Naitian Zhou

David Bamman retweeted

Naitian Zhou @NaitianZhou

1 Oct 2024

In cultural analytics, accuracy is often not the only (or even primary) objective. Here, we explore the myriad ways CA uses classification, how LLMs compare to other commonly used methods, and how they might enable new approaches to sensemaking from text data.

David Bamman @dbamman

30 Sep 2024

663

David Bamman

David Bamman @dbamman

30 Sep 2024

GitHub - bamman-group/ca-classification-data: Data and code to support "On Classification with...

Data and code to support "On Classification with Large Language Models in Cultural Analytics" - bamman-group/ca-classification-data

github.com

2,954

David Bamman

David Bamman @dbamman

27 Aug 2024

Big congrats to @KentKChang for passing his qualifying exam today! Lots of super exciting work on measuring social interactions in culture in the pipeline --

1,861

David Bamman

David Bamman @dbamman

4 Aug 2024

Well deserved, congrats Kent!

Kent K. Chang

@KentKChang

3 Aug 2024

It’s an extraordinary pleasure and honor to teach alongside @dbamman and his wonderful students of NLP, now doubly so to have my small part recognized by @BerkeleyISchool & UC Berkeley.

1,060

Kent K. Chang

David Bamman retweeted

Kent K. Chang

@KentKChang

3 Aug 2024

It’s an extraordinary pleasure and honor to teach alongside @dbamman and his wonderful students of NLP, now doubly so to have my small part recognized by @BerkeleyISchool & UC Berkeley.

4,378

David Bamman

David Bamman @dbamman

17 Jun 2024

See Naitian at poster #5!

Naitian Zhou @NaitianZhou

16 Jun 2024

Hey NLPals, I'll be at #NAACL2024 this upcoming week! Let's chat about sociocultural NLP, what it means to study culture, and finding variation in unusual places (like memes!) I'll be presenting this memes paper at the first poster session.

671

David Bamman

David Bamman @dbamman

17 Jun 2024

For anyone at #NAACL2024 considering lucha libre, I can attest it was spectacular (though that may be influenced by attending with my 9yo)

0:25

1,108

Lucy Li

David Bamman retweeted

Lucy Li @lucy3_li

15 Jun 2024

I’m headed to NAACL to present this paper! I’m around mostly Sunday evening thru Tuesday. This fall I’ll be doing some thinking about what to do after my PhD; if you have advice/thoughts about this definitely chat with me!

Lucy Li @lucy3_li

25 Oct 2023

New preprint! 🎉 We examine two contrasting yet common assumptions around what it means for an NLG model or system to be “fair” or “good”: 1⃣ treating all social groups the same, where “bias” = any diff in outputs (invariance), or 2⃣customizing outputs to them (adaptation).

ALT a screenshot of the upper part of the first page of the paper, including the title, author list, and abstract.

16,660

Naitian Zhou

David Bamman retweeted

Naitian Zhou @NaitianZhou

16 Jun 2024

Naitian Zhou @NaitianZhou

16 Nov 2023

Memes are pervasive in online speech. Do they have the socially meaningful variation we see in other aspects of language? YES! New preprint from me, @david__jurgens and @dbamman on the semantic structure and visual diversity of 3.8M Reddit memes. 🌐 naitian.org/social-memeing

5,034

David Bamman

David Bamman @dbamman

17 Jun 2024

Looking forward to seeing people at #NAACL2024 this week! Today, be sure to check out @NaitianZhou's poster on the sociolinguistics of memes (11am) and @lucy3_li's talk on concepts of fairness in NLG systems at 2:36pm (ethics/bias/fairness 1)

1,662

The Center for Digital Humanities at Princeton

David Bamman retweeted

The Center for Digital Humanities at Princeton @PrincetonCDH

24 Feb 2024

Join us on Monday, 2/26 at 4:30 pm for a lecture by @dbamman: The Promise and Peril of Large Language Models for Cultural Analytics. RSVP: forms.gle/by1m6xHzTLhjJQbd8 More info: cdh.princeton.edu/events/202… Co-sponsored by @PrincetonPLI.

1,679

Andrew Piper

David Bamman retweeted

Andrew Piper @_akpiper

23 Jan 2024

Very excited to announce the launch of our citizen science initiative "The Lives of Literary Characters" hosted @the_zooniverse. This is the first ever literary citizen science project that aims to promote story understanding. A Thread 🧵 zooniverse.org/projects/citi…

6,863

Lucy Li

David Bamman retweeted

Lucy Li @lucy3_li

16 Jan 2024

New preprint! 📜 We investigate how ten “quality” and English langID filters, drawn from prior lit on LLM pretraining data curation pipelines, affect webpages linked to self-descriptions of their creators. Paper: arxiv.org/abs/2401.06408 Data: huggingface.co/datasets/alle… 🧵(1/6)

a screenshot of the first page of a research paper titled "AboutMe: Using Self-Descriptions in Webpages to Document the Effects of English Pretraining Data Filters" with a diagram showing a paraphrased excerpt for a website's about page, with common stated social dimensions emphasized with highlighting. The dimensions are: individuals or organizations, social roles, geographic locations, and topical interests.

ALT a screenshot of the first page of a research paper titled "AboutMe: Using Self-Descriptions in Webpages to Document the Effects of English Pretraining Data Filters" with a diagram showing a paraphrased excerpt for a website's about page, with common stated social dimensions emphasized with highlighting. The dimensions are: individuals or organizations, social roles, geographic locations, and topical interests.

153

32,166

Katie Keith

David Bamman retweeted

Katie Keith @katakeith

22 Dec 2023

🚨NLP CSS workshop is back and will be at NAACL 2024! Paper submission deadline: March 24 sites.google.com/site/nlpand… Organizing team: @anjalie_f @dallascard @dirk_hovy and myself

NLP CSS Workshops

https://www.pexels.com/photo/group-hand-fist-bump-1068523/

sites.google.com

11,799

David Bamman

David Bamman @dbamman

10 Dec 2023

Congrats to Masha et al on this best industry paper award! (Masha’s a Berkeley School of Information MIMS alum!)

EMNLP 2026 @emnlpmeeting

10 Dec 2023

EMNLP 2023 Best Industry Paper Personalized Dense Retrieval on Global Index for Voice-enabled Conversational Systems (Masha Belyi, Charlotte Dzialo, Chaitanya Dwivedi, Prajit Muppidi, Kanna Shimizu) aclanthology.org/2023.emnlp-… #EMNLP2023 #NLProc

1,015

David Bamman

David Bamman @dbamman

10 Dec 2023

Awesome work! Congrats @nikita_mehandru @swetaagrawal20 et al!!!!

Marine Carpuat @MarineCarpuat

10 Dec 2023

I’m thrilled that this Human-Centered MT paper was recognized with an outstanding paper award at #EMNLP2023. Congratulations to lead authors Nikita Mehandru (@ucberkeley iSchool) and @swetaagrawal20 (@umdclip @istecnico) for making this interdisciplinary collaboration a success!

1,295

Marine Carpuat

David Bamman retweeted

Marine Carpuat @MarineCarpuat

10 Dec 2023

Marine Carpuat @MarineCarpuat

7 Dec 2023

Replying to @MarineCarpuat

2/8 "Physician Detection of Clinical Harm in Machine Translation: Quality Estimation Aids in Reliance and Backtranslation Identifies Critical Errors" with @nikita_mehandru @swetaagrawal20 @elainekhoong Niloufar Salehi among others arxiv.org/abs/2310.16924 virtual2023.emnlp.org/paper_…

113

19,228