architecting systems that think - responsibly โ€ข senior ml engineer โ€ข bit of stock market too.

Joined May 2020
284 Photos and videos
Pinned Tweet
๐Ÿงต How to Become a #MachineLearning and #AI Expert in 6 Months (Detailed, Free Resources) Ready to dive deep into ML? From my 5 years of experience in ML and AI, here's a month-by-month roadmap using comprehensive, high-quality, and free resources. Let's get started! ๐Ÿ‘‡
1
1
7
2,160
The Information Bottleneck (IB) principle offers a profound lens for understanding why deep networks generalize despite overparameterization. Introduced by Tishby et al. (2015), IB frames learning as an optimal compression problem: a network layer should retain only information in input X that is predictive of label Y. Formally, IB minimizes: L = I(X; T) - ฮฒยทI(T; Y) where I(ยท;ยท) denotes mutual information, T is the learned representation, and ฮฒ controls the compression-prediction tradeoff. The optimal solution satisfies: p(t|x) โˆ p(t)ยทexp(-ฮฒยทD_KL[p(y|x) || p(y|t)]) During SGD training, neural networks exhibit two distinct phases: 1. ๐…๐ข๐ญ๐ญ๐ข๐ง๐  ๐ฉ๐ก๐š๐ฌ๐ž: I(X;T) and I(T;Y) both increase rapidly as the network memorizes patterns 2. ๐‚๐จ๐ฆ๐ฉ๐ซ๐ž๐ฌ๐ฌ๐ข๐จ๐ง ๐ฉ๐ก๐š๐ฌ๐ž: I(X;T) decreases while I(T;Y) remains stable-the network forgets irrelevant input details while preserving task-relevant signals The critical insight: ๐ ๐จ๐จ๐ ๐ ๐ž๐ง๐ž๐ซ๐š๐ฅ๐ข๐ณ๐š๐ญ๐ข๐จ๐ง ๐œ๐จ๐ซ๐ซ๐ž๐ฅ๐š๐ญ๐ž๐ฌ ๐ฐ๐ข๐ญ๐ก ๐œ๐จ๐ฆ๐ฉ๐ซ๐ž๐ฌ๐ฌ๐ข๐จ๐ง. Layers progressively discard nuisance information (noise, spurious correlations) irrelevant to Y. Empirically, the penultimate layer's I(X;T) correlates strongly with test accuracy. For a k-class problem, the representation T at the final hidden layer satisfies I(T;Y) โ‰ค H(Y) โ‰ค log k bits, bounding retained label information. Stochastic gradient noise implicitly implements a form of variational optimization over this constrained objective. Recent work (Saxe et al., 2018) nuances this: without nonlinearities or specific activation symmetries, networks don't always compress. However, with ReLU-like activations and finite precision, compression emerges naturally as an implicit regularizer. This explains a paradox: massive networks (millions of parameters) generalize well because they learn ๐ฆ๐ข๐ง๐ข๐ฆ๐š๐ฅ ๐ฌ๐ฎ๐Ÿ๐Ÿ๐ข๐œ๐ข๐ž๐ง๐ญ ๐ซ๐ž๐ฉ๐ซ๐ž๐ฌ๐ž๐ง๐ญ๐š๐ญ๐ข๐จ๐ง๐ฌ-encoding exactly what Y requires and nothing more, effectively solving an information-theoretic optimization that classical statistical learning theory struggles to characterize. #InformationBottleneck #DeepLearningTheory #Generalization
18
Mohit ๐Ÿ˜ผ retweeted
1. Never tell anyone how much you make 2. Get proper sleep (or at least try to do so) 3. Live in a way no one else does (do the opposite) 4. Read every day 5. Learn to communicate 6. Go off the radar for 48 hours 7. Use AI the right way 8. You are one DM away from changing your life 9. Find a stress protocol that works for you 10. Learn how to communicate your emotions (#1 way not to get depressed) 11. Always minimize (less is more) 12. Phone calls > podcasts, audiobooks, and music 13. Change your workspace often (Creativity hack) 14. Spontaneity
12
60
656
28,728
Mohit ๐Ÿ˜ผ retweeted
6
72
969
6,503
I love how my mom always click my candid photos โค๏ธ
208
uh here we go again.
May 28
Introducing Claude Opus 4.8: it builds on Opus 4.7 with sharper judgment, more honesty about its own progress, and the ability to work independently for longer than its predecessors. Available today at the same price.
36
walmart joining posts disappeared like never existed.
23
Mohit ๐Ÿ˜ผ retweeted
> lifted 30 crore indians out of poverty > free ration to 80cr poor people how both things work simultaneously?
What's the biggest scam in history?
69
381
2,929
54,755
Rajat Patidar for t20 captaincy ??

58
Is a 60-day notice period really that big of a deal for recruiters now? Hadd ho gayi - the moment they hear โ€œ60 daysโ€ they donโ€™t even listen to the next word. Pareshaan ho gaya bhai.
25
open-source is a charity done by devs. change my mind
1
1
22
Mohit ๐Ÿ˜ผ retweeted
May 25
Replying to @annhybri
All the three words youโ€™ve uttered are incomplete without me.
1
726
Bkl GPT.
17
I never really understood why people call their newly bought gadgets or vehicles a โ€œnew toy.โ€ For someone coming from a lower middle-class background, itโ€™s never just a toy. It carries years of hard work, sacrifices, patience, and silent prayers behind it. Every small achievement feels deeply personal because you know what it took to reach there. Grateful. Blessed. Proud. โค๏ธ Definitely not a toy. ๐Ÿงฟ
6
18
2,328
Reading the comments made me realize how many people started watching cricket after COVID and still donโ€™t understand the game. Anyway - Arjun bowled really well. ๐ŸฅŽ
Well done, Arjun. โค๏ธ Proud of the way youโ€™ve carried yourself through this season, always believing in your ability, staying patient, working hard quietly, and remaining positive despite having to wait for your opportunity till the very last match. Cricket tests patience as much as skill, and you handled both beautifully today. Keep your feet on the ground, and continue being in love with the game like you always have. Love you always.๐Ÿ‘
109
Didnโ€™t know Google pays this much less for an L4 role in India. Saved myself the effort of preparing and going through another switch cycle. ๐Ÿฅฒ
2
5
4,359
What tf
87
เคเค• เคฎเคพเค เคธเฅ‡ เคฌเคกเคผเคพ เคฏเฅ‹เคฆเฅเคงเคพ เค•เฅ‹เคˆ เคจเคนเฅ€เค‚ เคนเฅ‹เคคเคพ โ™ฅ๏ธ
She lost her husband in an accident. Now sheโ€™s working as a Zomato delivery worker to feed her family. She even takes her two kids with her during deliveries. Respect to such brave women. She is a million times better than feminist girls like Rebel Kid Apoorva.
102
Lagrangian formulations for implicit neural representations and solver theory- The Lagrangian formulation of implicit neural representations (INRs) reveals a deep connection between neural networks and classical mechanics-unifying network architecture with variational principles. An INR parameterized by weights ฮธ represents a continuous signal as f(x; ฮธ), where x โˆˆ โ„แตˆ. The Lagrangian approach introduces a functional: โ„’[ฮธ] = โˆซ_ฮฉ L(x, ฮธ, โˆ‡ฮธ, โˆ‡ยฒฮธ...) dx Minimizing this action S = โˆซ โ„’ dt yields Euler-Lagrange equations: โˆ‚โ„’/โˆ‚ฮธ - โˆ‡ ยท (โˆ‚โ„’/โˆ‚โˆ‡ฮธ) = 0 This connects to solver theory: training INRs is equivalent to solving a boundary value problem. The network dynamics follow: M(ฮธ)ฮธฬˆ D(ฮธ)ฮธฬ‡ = -โˆ‡V(ฮธ) where M is the mass matrix (parameter metric) and V captures reconstruction loss. Key insight: symplectic integrators preserve the symplectic structure of (ฮธ, โˆ‚โ„’/โˆ‚ฮธ), ensuring long-term stability-critical for physics-informed INRs. Alternatively, treating ฮธ as generalized coordinates gives the Hamiltonian: แน— = -โˆ‡V(ฮธ), ฮธฬ‡ = Mโปยนp, enabling geometrically stable optimization via reversible dynamics solvers like Ritter et al.'s port-Hamiltonian approach. This framework connects directly to adaptive solvers, residual networks as discrete symplectic systems, and the geometry of loss landscapes. The action principle provides principled regularization beyond simple Lยฒ reconstruction, with natural energy conservation properties. #LagrangianMechanics #ImplicitNeural #SolverTheory #MachineLearning
1
3
133
This image is generated by ChatGPT - crazyy ๐Ÿคฏ
66
If he plays all 20 overs - almost every bowler will get hit for four sixes in their spell by him. Crazy kid ๐Ÿคฏ
Vaibhav Suryavanshi has smashed 61 sixes from his 290 balls career in the IPL so far. ๐Ÿฅถ - A six every 4.75 ball. ๐Ÿคฏ
108