I am a Data Enthusiast & Big Data Developer. Currently a Data science grad student. @KDNuggets contributing editor. #BigData #DataScience #MachineLearning

Joined October 2011
3 Photos and videos
Devendra Desale retweeted
11 Oct 2024
Large Language Models don't reason. Thank you, Apple.
284
1,159
8,746
955,212
Devendra Desale retweeted
Nice Paper for a long weekend read - "A Primer on the Inner Workings of Transformer-based Language Models" πŸ“Œ Provides a concise intro focusing on the generative decoder-only architecture. πŸ“Œ Introduces the Transformer layer components, including the attention block (QK and OV circuits) and feedforward network block, and explains the residual stream perspective. It then categorizes LM interpretability approaches into two dimensions: localizing inputs or model components responsible for a prediction (behavior localization) and decoding information stored in learned representations to understand its usage across network components (information decoding). πŸ“Œ For behavior localization, the paper covers input attribution methods (gradient-based, perturbation-based, context mixing) and model component importance techniques (logit attribution, causal interventions, circuits analysis). Causal interventions involve patching activations during the forward pass to estimate component influence, while circuits analysis aims to reverse-engineer neural networks into human-understandable algorithms by uncovering subsets of model components interacting together to solve a task. πŸ“Œ Information decoding methods aim to understand what features are represented in the network. Probing trains supervised models to predict input properties from representations, while the linear representation hypothesis states that features are encoded as linear subspaces. Sparse autoencoders (SAEs) can disentangle superimposed features by learning overcomplete feature bases. Decoding in vocabulary space involves projecting intermediate representations and model weights using the unembedding matrix. πŸ“Œ Then summarizes discovered inner behaviors in Transformers, including interpretable attention patterns (positional, subword joiner, syntactic heads) and circuits (copying, induction, copy suppression, successor heads), neuron input/output behaviors (concept-specific, language-specific neurons), and the high-level structure mirroring sensory/motor neurons. Emergent multi-component behaviors are exemplified by the IOI task circuit in GPT2-Small. Insights on factuality and hallucinations highlight the competition between grounded and memorized recall mechanisms.
10
171
862
69,414
Devendra Desale retweeted
25 May 2024

8,184
37,215
237,227
54,292,592
Devendra Desale retweeted
13 May 2024
Introducing GPT-4o, our new model which can reason across text, audio, and video in real time. It's extremely versatile, fun to play with, and is a step towards a much more natural form of human-computer interaction (and even human-computer-computer interaction):
833
4,725
21,757
4,358,931
Devendra Desale retweeted
1 May 2024
ChatGPT can now create Mind Maps. No more wasting hundreds of hours making visuals for studying or simplifying complex ideas. Here’s how to do it for free in a few seconds:
57
291
1,774
422,907
Devendra Desale retweeted
30 Apr 2024
Chatting with @GroqInc’s CEO @JonathanRoss321. Groq has super fast token generation capabilities now. And, I was excited also to hear about his plans to scale up capacity aggressively and also expand this to other models than just LLMs! This is a good time to be building AI applications.
53
102
1,184
148,096
Devendra Desale retweeted
16 Feb 2024
OpenAI'a Sora is the best example of Synthetic data example. Hard to replicate such a moat in an enterprise but if we can get the right distribution of the data and its attributes, I think we can see better models for the basic use cases of the enterprise. cc @DevendraDesale
1
59
Finally free from stranglehold of Facebook products. Moved over 40 friends to signal over last weekend and finally successful in #Deletewhatsapp.
2
Excited to be networking from home on @lunchclubai! Use my invite link to skip the waitlist and meet interesting people over video: lunchclub.com/?invite_code=d…

1
Devendra Desale retweeted
The fastest route is not always a straight line. x.com/knowIedgehub/status/13…

518
12,442
40,435
Devendra Desale retweeted
28 Sep 2020
People can be: 1) How-first 2) What-first 3) Why-first How-first people execute well. What-first people create well. Why-first people lead well.
14
78
601
Devendra Desale retweeted
Guy on the right is a Growth PM
18
241
1,810
Devendra Desale retweeted
Drones, data analytics, smart seeds: How to reforest x1,000 faster after wildfires bit.ly/36aEiiJ #bigdata, #datascience #ds

7
9
If you can treat genuine cases of defence officers this way I dont know how you treat the rest of your customers. Neither airline helped him and now he has to spend a hefty amount on next tiket and spend the night at airport.
scheduled departure. He is on defence duty and requested multiple staff members to accommodate him on the indigo flight but they flatly refused. Spice jet too did not accommodate him on the next flight to Vizag and did not even reimburse the missed flight.
3
1
Pathetic customer service by @flyspicejet and @IndiGo6E. First the Pune to Hyderabad #spicejet flight gets delayed by 3 hrs. Then after informing #indigo customer service of the late arrival , they refuse to let my brother board the flight even though he reached an hour before
2
Devendra Desale retweeted
2 Dec 2017
Survival Analysis for Business #Analytics buff.ly/2AAiBNn
9
16
Devendra Desale retweeted
2 Dec 2017
The AI Index is out! Page after page of interesting charts showing AI trends. aiindex.org/ For example, since 2013, the share of US jobs requiring AI skills has grown 4.5x! @yshoham
17
881
1,401
Devendra Desale retweeted
3 May 2017
Top 10 #MachineLearning Videos on #YouTube, updated buff.ly/2p6HjeL
31
34