Tim Woelfle

Tim Woelfle

31 Photos and videos

Tweets

Pinned Tweet

Tim Woelfle @timwoelfle

9 Jun 2024

#LocalCitationNetwork now allows you to retrieve All References and All Citations from a given set of Input Articles with both @OpenAlex_org and @SemanticScholar! For example, these 11 input articles (via @vamrhein) have 406 references & 7143 citations: localcitationnetwork.github.… 1/

1,539

RC2NB

Tim Woelfle retweeted

RC2NB @RC2NB

16 Jan 2025

🚀 Excited to share our latest Journal of Neurology publication on dreaMS app. Six gamified, adaptive cognitive tests (<10 min) improve sensitivity to change by addressing floor/ceiling & practice effects. Big thanks to our team & partners! Read more: link.springer.com/article/10…

CoGames: Development of an adaptive smartphone-based and gamified monitoring tool for cognitive...

Journal of Neurology - As part of the development of a smartphone-based app for monitoring MS disease activity and progression (dreaMS, NCT05009160), we developed six gamified tests with multiple...

link.springer.com

188

Lars G. Hemkens

Tim Woelfle retweeted

Lars G. Hemkens @LGHemkens

12 Sep 2024

Human-AI collaboration may save time for a second human rater for reporting and bias assessments. We tested Claude-3-Opus, Claude-2, GPT-4, GPT-3.5, Mixtral-8x22B. Wonderful work led by @timwoelfle published in @JClinEpi jclinepi.com/article/S0895-4…

1,657

Tim Woelfle

Tim Woelfle @timwoelfle

12 Sep 2024

Our work "Benchmarking Human-AI Collaboration for Common Evidence Appraisal Tools" is published in @JClinEpi! doi.org/10.1016/j.jclinepi.2… Evidence appraisal tools are very resource intensive but LLMs may assist human raters. Wonder how @OpenAI's o1 & @Meta's Llama 3.1 will perform?

Tim Woelfle @timwoelfle

23 Apr 2024

Check out our work on LLMs for systematic reviews of medical literature: Benchmarking Human-AI Collaboration for Common Evidence Appraisal Tools. We used @AnthropicAI's Claude-3-Opus, @OpenAI's GPT-4, @MistralAI's open-source Mixtral-8x22B: medrxiv.org/content/10.1101/… @LGHemkens 1/6

879

Max Welling

Tim Woelfle retweeted

Max Welling @wellingmax

9 Jun 2024

Shall we please stop worrying about rogue AI and instead worry about the Atlantic Overturning Circulation crossing a tipping point. It seems close and would make Europe basically unlivable. (Thanks to @jonkhler for the link) youtu.be/ZHNNW8c_FaA?si=hzPW…

Tipping risk of the Atlantic Ocean's overturning circulation, AMOC....

One of the most ominous risks for Europe is that of a major change ...

youtube.com

Sabine Hossenfelder

@skdh

9 Jun 2024

Just finished reading Aschenbrenner's manifesto (165p) about the impending intelligence explosion. I'm now rethinking my life plans. (Summary to follow on YT) situational-awareness.ai/

216

43,455

Gordon H. Guyatt

Tim Woelfle retweeted

Gordon H. Guyatt

@GuyattGH

7 Jun 2024

Why were so few RCTs done to find out optimal COVID control strategies (masks, isolation)? Why so few RCTs of educational strategies? We conduct uncontrolled experiments over & over, remain in the dark. Cultural change to accept RCTs outside conventional medicine urgently needed.

10,643

Tim Woelfle

Tim Woelfle @timwoelfle

9 Jun 2024

1,539

more replies

Tim Woelfle

Tim Woelfle @timwoelfle

9 Jun 2024

#LocalCitationNetwork will always remain free & open source, meaning 100% transparency! There are many other great literature mapping tools like @Inciteful_xyz (also open source), but most are closed source: @RsrchRabbit, @LitmapsApp, @ConnectedPapers 6/ x.com/RaziaAliani/status/179…

Razia Aliani

@RaziaAliani

13 May 2024

You enter keywords on Google Scholar. Then bam! Thousands of hits. Instead, use AI literature mapping tools! SAVE this guide to choose the right one for you ⤵ Sifting through the Google Scholar/ PubMed noise takes hours. DAYS even. And don't get me started on the compilations. Endless datasheets. Do yourself a favor and Use AI literature mapping tools They analyze and visualize scientific literature for you. Just input your seed paper or collection. The AI recommends similar papers. Ones ACTUALLY relevant to your search. You can see it all on an interactive map or graph. BUT.. How to decide the right tool for your use case? ⤴ That's why I created this comparison table for you --------------------------------------------------------- #aiinresearch #literaturereview #ai #literaturemapping @RsrchRabbit @LitmapsApp @Inciteful_xyz

1,150

Tim Woelfle

Tim Woelfle @timwoelfle

9 Jun 2024

Finally, check out this recent guidance on citation searching in the @bmj_latest: bmj.com/lookup/doi/10.1136/b… Direct citation searching is now fully implemented in #LocalCitationNetwork & we're working on indirect citation searching: doi.org/10.17605/OSF.IO/NPM2… Stay tuned! 7/7

The BMJ

Tim Woelfle retweeted

The BMJ

@bmj_latest

9 May 2024

Recommendations for researchers on when and how to conduct citation searching and how to report it bmj.com/content/385/bmj-2023…

Guidance on terminology, application, and reporting of citation searching: the TARCiS statement

Evidence syntheses adhering to systematic literature searching techniques are a cornerstone of evidence based healthcare. Beyond term based searching in electronic databases, citation searching is a...

bmj.com

9,638

Yann LeCun

Tim Woelfle retweeted

Yann LeCun

@ylecun

27 Apr 2024

As long as AI systems are trained to reproduce human-generated data (e.g. text) and have no search/planning/reasoning capability, performance will saturate below or around human level. Furthermore, the amount of trials needed to reach that level will be far larger than the amount of trials needed to train humans. LLMs are trained with 200,000 years worth of reading material and are still pretty dumb. Their usefulness resides in their vast accumulated knowledge and language fluency. But they are still pretty dumb.

Pedro Domingos

@pmddomingos

26 Apr 2024

Interesting how in all these domains AI is asymptoting at roughly human performance - where's the AI zooming past us to superintelligence that Kurzweil etc. predicted/feared?

234

732

4,160

824,633

Tim Woelfle

Tim Woelfle @timwoelfle

26 Apr 2024

Great study benchmarking LLMs on clinical oncology questions! They employ some similar techniques as we do, in particular the consistency approach on repeated prompts. The self-assessed confidence is a very interesting approach I'd like to see more in the future.

NEJM AI @NEJM_AI

25 Apr 2024

Original Article: Comparative Evaluation of LLMs in Clinical Oncology nejm.ai/4aJWOAY

Figure 3 from the NEJM AI Original Article "Comparative Evaluation of LLMs in Clinical Oncology": Self-Assessed Confidence Has Discriminatory Power in High-Performing Models.
In each question prompt, the models were asked to evaluate their confidence (from 1 to 4) in the response, in which 1 represented minimal confidence (i.e., a random guess) and 4 represented maximal confidence. The self-assessed confidence score had discriminatory power for PaLM 2, generative pretrained transformer 3.5 (GPT-3.5), Claude-v1, and GPT-4. B denotes billion.

ALT Figure 3 from the NEJM AI Original Article "Comparative Evaluation of LLMs in Clinical Oncology": Self-Assessed Confidence Has Discriminatory Power in High-Performing Models. In each question prompt, the models were asked to evaluate their confidence (from 1 to 4) in the response, in which 1 represented minimal confidence (i.e., a random guess) and 4 represented maximal confidence. The self-assessed confidence score had discriminatory power for PaLM 2, generative pretrained transformer 3.5 (GPT-3.5), Claude-v1, and GPT-4. B denotes billion.

129

David Nunan

Tim Woelfle retweeted

David Nunan @dnunan79

23 Apr 2024

The study I was waiting for (and knew would be done). Echoes my (disappointing) experience of Cochrane RoB using GPT4 And don’t ask it to do anything around data integrity checks!

Lars G. Hemkens @LGHemkens

23 Apr 2024

We tested how we can best collaborate with AI to do systematic reviews, meta-research or asses study designs - fantastic team and teamwork, thank you @timwoelfle et al!!

1,637

Kari Tikkinen

Tim Woelfle retweeted

Kari Tikkinen @KariTikkinen

24 Apr 2024

”Current LLMs alone appraised evidence worse than humans. Human-AI collaboration may reduce workload for the second human rater for the assessment of reporting (PRISMA) and methodological rigor (AMSTAR) but not for complex tasks such as PRECIS-2.” #EBM #AI

Lars G. Hemkens @LGHemkens

23 Apr 2024

We tested how we can best collaborate with AI to do systematic reviews, meta-research or asses study designs - fantastic team and teamwork, thank you @timwoelfle et al!!

1,001

Lars G. Hemkens

Tim Woelfle retweeted

Lars G. Hemkens @LGHemkens

23 Apr 2024

We tested how we can best collaborate with AI to do systematic reviews, meta-research or asses study designs - fantastic team and teamwork, thank you @timwoelfle et al!!

Tim Woelfle @timwoelfle

23 Apr 2024

4,256

Adam Rodman

Tim Woelfle retweeted

Adam Rodman @AdamRodmanMD

23 Apr 2024

Fantastic study (and great research methodology) on the abilities of LLMs to perform evidence appraisal. Certainly something a lot of us have been hoping for. TL;DR: humans outperform LLMs alone, but human AI performs quite well in some settings.

Tim Woelfle @timwoelfle

23 Apr 2024

3,350

Tim Woelfle

Tim Woelfle @timwoelfle

23 Apr 2024

Our >2000 API calls made full use of context lengths >16k tokens. Unfortunately, the current 8k context length of @metaAI 's promising Llama3 is too short. For @AnthropicAI's multimodal Claude-3-Opus, we converted PDFs to >1500 PNGs (one per page), uploading ~2 GB of images. 5/

160

Tim Woelfle

Tim Woelfle @timwoelfle

23 Apr 2024

Our code & data are fully open source and the framework is easily extendable! Check out our streamlined pipeline to add new LLMs and our interactive dashboards using @rmarkdown: github.com/timwoelfle/Eviden… 6/6

GitHub - timwoelfle/Evidence-Appraisal-AI: Interactive dashboards:

Interactive dashboards:. Contribute to timwoelfle/Evidence-Appraisal-AI development by creating an account on GitHub.

github.com

119