Santiago

Santiago

3 Photos and videos

Tweets

Devendra Desale retweeted

Santiago

@svpino

11 Oct 2024

Large Language Models don't reason. Thank you, Apple.

284

1,159

8,746

955,212

Rohan Paul

Devendra Desale retweeted

Rohan Paul

@rohanpaul_ai

12 Oct 2024

Nice Paper for a long weekend read - "A Primer on the Inner Workings of Transformer-based Language Models" 📌 Provides a concise intro focusing on the generative decoder-only architecture. 📌 Introduces the Transformer layer components, including the attention block (QK and OV circuits) and feedforward network block, and explains the residual stream perspective. It then categorizes LM interpretability approaches into two dimensions: localizing inputs or model components responsible for a prediction (behavior localization) and decoding information stored in learned representations to understand its usage across network components (information decoding). 📌 For behavior localization, the paper covers input attribution methods (gradient-based, perturbation-based, context mixing) and model component importance techniques (logit attribution, causal interventions, circuits analysis). Causal interventions involve patching activations during the forward pass to estimate component influence, while circuits analysis aims to reverse-engineer neural networks into human-understandable algorithms by uncovering subsets of model components interacting together to solve a task. 📌 Information decoding methods aim to understand what features are represented in the network. Probing trains supervised models to predict input properties from representations, while the linear representation hypothesis states that features are encoded as linear subspaces. Sparse autoencoders (SAEs) can disentangle superimposed features by learning overcomplete feature bases. Decoding in vocabulary space involves projecting intermediate representations and model weights using the unembedding matrix. 📌 Then summarizes discovered inner behaviors in Transformers, including interpretable attention patterns (positional, subword joiner, syntactic heads) and circuits (copying, induction, copy suppression, successor heads), neuron input/output behaviors (concept-specific, language-specific neurons), and the high-level structure mirroring sensory/motor neurons. Emergent multi-component behaviors are exemplified by the IOI task circuit in GPT2-Small. Insights on factuality and hallucinations highlight the competition between grounded and memorized recall mechanisms.

171

862

69,414

Elon Musk

Devendra Desale retweeted

Elon Musk

@elonmusk

25 May 2024

0:57

8,184

37,215

237,227

54,292,592

Greg Brockman

Devendra Desale retweeted

Greg Brockman

@gdb

13 May 2024

Introducing GPT-4o, our new model which can reason across text, audio, and video in real time. It's extremely versatile, fun to play with, and is a step towards a much more natural form of human-computer interaction (and even human-computer-computer interaction):

5:54

833

4,725

21,757

4,358,931

Abhishek

Devendra Desale retweeted

Abhishek

@HeyAbhishek

1 May 2024

ChatGPT can now create Mind Maps. No more wasting hundreds of hours making visuals for studying or simplifying complex ideas. Here’s how to do it for free in a few seconds:

291

1,774

422,907

Andrew Ng

Devendra Desale retweeted

Andrew Ng

@AndrewYNg

30 Apr 2024

Chatting with @GroqInc’s CEO @JonathanRoss321. Groq has super fast token generation capabilities now. And, I was excited also to hear about his plans to scale up capacity aggressively and also expand this to other models than just LLMs! This is a good time to be building AI applications.

102

1,184

148,096

Swapnil

Devendra Desale retweeted

Swapnil @swapnilkate04

16 Feb 2024

OpenAI'a Sora is the best example of Synthetic data example. Hard to replicate such a moat in an enterprise but if we can get the right distribution of the data and its attributes, I think we can see better models for the basic use cases of the enterprise. cc @DevendraDesale

Devendra Desale

Devendra Desale @DevendraDesale

11 Jan 2021

Finally free from stranglehold of Facebook products. Moved over 40 friends to signal over last weekend and finally successful in #Deletewhatsapp.

Devendra Desale

Devendra Desale @DevendraDesale

11 Jan 2021

Excited to be networking from home on @lunchclubai! Use my invite link to skip the waitlist and meet interesting people over video: lunchclub.com/?invite_code=d…

Vala Afshar

Devendra Desale retweeted

Vala Afshar

@ValaAfshar

4 Oct 2020

The fastest route is not always a straight line. x.com/knowIedgehub/status/13…

518

12,442

40,435

Shreyas Doshi

Devendra Desale retweeted

Shreyas Doshi

@shreyas

28 Sep 2020

People can be: 1) How-first 2) What-first 3) Why-first How-first people execute well. What-first people create well. Why-first people lead well.

601

andrew chen

Devendra Desale retweeted

andrew chen

@andrewchen

6 Aug 2020

Guy on the right is a Growth PM

241

1,810

Richard Eudes, PhD

Devendra Desale retweeted

Richard Eudes, PhD @RichardEudes

19 Jan 2020

Drones, data analytics, smart seeds: How to reforest x1,000 faster after wildfires bit.ly/36aEiiJ #bigdata, #datascience #ds

Devendra Desale

Devendra Desale @DevendraDesale

20 Dec 2017

If you can treat genuine cases of defence officers this way I dont know how you treat the rest of your customers. Neither airline helped him and now he has to spend a hefty amount on next tiket and spend the night at airport.

Devendra Desale

Devendra Desale @DevendraDesale

20 Dec 2017

scheduled departure. He is on defence duty and requested multiple staff members to accommodate him on the indigo flight but they flatly refused. Spice jet too did not accommodate him on the next flight to Vizag and did not even reimburse the missed flight.

Devendra Desale

Devendra Desale @DevendraDesale

20 Dec 2017

Pathetic customer service by @flyspicejet and @IndiGo6E. First the Pune to Hyderabad #spicejet flight gets delayed by 3 hrs. Then after informing #indigo customer service of the late arrival , they refuse to let my brother board the flight even though he reached an hour before

KDnuggets

Devendra Desale retweeted

KDnuggets

@kdnuggets

2 Dec 2017

Survival Analysis for Business #Analytics buff.ly/2AAiBNn

Andrew Ng

Devendra Desale retweeted

Andrew Ng

@AndrewYNg

2 Dec 2017

The AI Index is out! Page after page of interesting charts showing AI trends. aiindex.org/ For example, since 2013, the share of US jobs requiring AI skills has grown 4.5x! @yshoham

881

1,401

Devendra Desale

Devendra Desale @DevendraDesale

27 Oct 2017

check.universa.io/-7a3qEs

KDnuggets

Devendra Desale retweeted

KDnuggets

@kdnuggets

3 May 2017

Top 10 #MachineLearning Videos on #YouTube, updated buff.ly/2p6HjeL

ALT null