暗夜之殇

暗夜之殇

Users
Tweets

暗夜之殇

@AZ_fanta

Jun 12

ReLU & GELU — How a Neuron Decides A neuron has to make one decision, billions of times a second: fire, or don't fire. For most of AI's history, that decision was made by a smooth curve — sigmoid, or its cousin tanh. Sigmoid takes any number, squeezes it into a value between 0 and 1, and outputs that. Beautiful math. Catastrophic in practice. The problem: at the extremes, the curve flattens. The gradient — the signal that tells the neuron "you should adjust" — shrinks to almost nothing. Stack 50 of these layers, and the learning signal at the bottom is essentially zero. Networks refused to train. The field hit a wall around 2006 and stayed there for years. In 2010, two researchers named Vinod Nair and Geoffrey Hinton proposed something brutally simple. Replace the curve with a hinge. If the input is positive, pass it through unchanged. If the input is negative, output zero. That's it. That was the entire contribution. They called it ReLU — Rectified Linear Unit. The math is one line. The effect was enormous. The gradient on the positive side stays at exactly 1. No vanishing. Signals flow through 50 layers, 100 layers, 1,000 layers. The deeper you stack, the more ReLU outperforms the smooth curves that came before. By 2015 it was the default. By 2020 it was everywhere. But ReLU had a quiet flaw. A neuron that only ever sees negative inputs will output zero forever. Its gradient is also zero. It never recovers. Engineers called this the "dying ReLU" problem. A bad initialization, a bad batch, and 30% of your network could just go silent and never come back. In 2016, Dan Hendrycks and Kevin Gimpel asked: what if the decision to fire is probabilistic, not binary? They took the standard normal curve — the bell curve — and asked, for each input, "what's the probability that this value is positive?" Multiply the input by that probability. If the input is large and positive, fire fully. If it's slightly negative, fire a little. If it's deeply negative, don't fire. Smooth where ReLU is sharp. Gentle where ReLU kills. They called it GELU — Gaussian Error Linear Unit. Every BERT, every GPT, every modern transformer uses GELU. The hinge was good. The probabilistic hinge was better. The lesson: a neuron doesn't need a smooth decision. It needs the right decision. For a decade, the right answer was "yes or no." Now it's "yes, probably, by this much." — 算子次元 · One-minute AI · #13 ReLU & GELU #AI #MachineLearning #DeepLearning #NeuralNetworks #ActivationFunction

Shine Gupta

Shine Gupta @shine_gupta17

17 Oct 2025

Somewhere in the start this was the result now it’s improving and improving (soon beating the baseline model) 🚀 @SakanaAILabs papers are helping a lot 💡 #NEAT #activationfunction

Shine Gupta @shine_gupta17

14 Oct 2025

Not so good it was a 14% win over the baseline model I am happy but wanting to reach 25% Better than earlier but I am devastatedly tired as it took 161172.8s == 44.77 hrs, will try one more time after changes with the activation functions and all lets see after 50 hrs

284

Aman Pandey🇮🇳

Aman Pandey🇮🇳@lak__ky

17 Jun 2024

🚀 Day 8 of #100DaysOfDeepLearning: 🛡️ Learned Regularization: Keeping those models in preventing overfitting! ⚡ Mastered Activation Functions: Adding non-linearity and depth to neural networks! #DeepLearning #Regularization #ActivationFunction #100DaysOfCode

Imad Ali Shah

Imad Ali Shah @imadalishah

14 May 2024

and the #activationfunction (#ReLU, #Softmax, etc) is the traffic warden allowing/restricting/converting the flow direction Now let's make it complex! Deep learning architectures come in various flavors, So:-

Stephen Blum

Stephen Blum

@stephenlb

9 Apr 2024

Tanh as an activation function in machine learning offers high accuracy but is computationally expensive. #tanh #activationfunction #machinelearning

1:55

251

Sachin Kumar

Sachin Kumar @Sachintukumar

9 Mar 2024

📝Day 20 of #Deeplearning ▫️ Topic - Activation Function 🔰In Artificial Neural Networks or ANN, each neuron forms a weighted sum of its inputs & passes resulting scalar value through a function referred to as an #Activationfunction or step function A Complete 🧵

6,923

Jonas Lara

Jonas Lara @jonas1ara

15 Jan 2024

In this case the activationFunction implements the step function. The function returns 1.0 if the input is greater than 0 and 0.0 otherwise.

300

Maqbool Khan

Maqbool Khan @dr_maqbool_khan

2 Jul 2023

you are looking for the "best" ACTIVATION function, be ready to spend some time looking for it because there are hundreds of them! #ArtificialIntelligence #AI #Robotics #DataScience #MachineLearning #DeepLearning #DataScientists #algorithms #activationfunction

siliconlikes by uniqtech

siliconlikes by uniqtech @siliconlikes

6 Jun 2023

Uniqtech Guide to Activation Functions medium.com/data-science-boot… #DataScience #ActivationFunction #MachineLearning #AI #100DaysOfCode #100DaysOfPython

🇾🇪جوانمرد دهه شصت🇵🇸🇮🇷

🇾🇪جوانمرد دهه شصت🇵🇸🇮🇷@JavanmardDahe60

1 Jun 2023

رشته توئیت #شبکه_عصبی به جایی رسیده که داره مطالبش مشکل تر میشه. برای توضیح مطالب جدید، بهتر دیدم رشته توئیت های جداگانه بزنم و از اونجا ارجاع بدم به این رشته توئیت ها. اولین رشته توئیت در مورد #تابع_فعال_سازی یا #ActivationFunction هست که قراره اینجا در موردش بگم/1

766

DeepAI

DeepAI

@DeepAI

20 Mar 2023

🤯 Lowkey Goated When Deep Reinforcement Learning Continues To Amaze! 🤯 Check out the groundbreaking paper by @zaheer_abbas14 et al. on Loss of Plasticity in Continual #ReinforcementLearning #ActivationFunction 🤩 Link: deepai.org/publication/loss-…

1,307

DeepAI

DeepAI

@DeepAI

8 Mar 2023

🔥Lowkey Goated When ReLU Networks Are The Vibe🔥 Check out this paper on two-layer ReLU Networks by Yiwen Kou et al. #DeepLearning #ActivationFunction deepai.org/publication/benig…

933

DeepAI

DeepAI

@DeepAI

12 Feb 2023

🤯 Check out this groundbreaking paper by @juhan_bae, Nikita Dhawan et al. on Activation Functions and ReLus! 🔗 deepai.org/publication/effic… #NeuralNetworks #ActivationFunction #ReLu

Efficient Parametric Approximations of Neural Network Function Space Distance

02/07/23 - It is often useful to compactly summarize important properties of model parameters and training data so that they can be used late...

deepai.org

2,043

DeepAI

DeepAI

@DeepAI

17 Jan 2023

Level up your data science vocabulary: Constant Error Carousel deepai.org/machine-learning-… #ActivationFunction #ConstantErrorCarousel

1,184

DeepAI

DeepAI

@DeepAI

12 Dec 2022

Level up your data science vocabulary: Activation Function deepai.org/machine-learning-… #VanishingGradientProblem #ActivationFunction

Activation Function

An activation function sets the output behavior of each node, or “neuron” in an artificial neural network.

deepai.org

Save to Notion 

Save to Notion 

@SaveToNotion

25 Nov 2022

Replying to @tulkas72

Saved this Tweet to your Notion database. Tags: [Activationfunction, Neuralnetwork, Tutorial, Datascience]

DeepAI

DeepAI

@DeepAI

21 Nov 2022

MixBin: Towards Budgeted Binarization deepai.org/publication/mixbi… by @UdbhavBamba et al. #ActivationFunction #Binarization

محمد 🤖🧠

محمد 🤖🧠

@jo_moe_90_AI

18 Nov 2022

في التعلم العميق #DeepLearning #CNN يكون هنالك خطآ في فهم دور #ActivationFunction and #Pooling حيث AF تعمل على اضافه وظيفه غيرخطية الى النموذج لتمثيل وظائف معقدة بينما P تعمل علي تقليل الابعاد والسمات وكمية بيانات التعلم وبطريقة غير خطيه ويجب تطبيقها بعد AF في النموذج #Python #AI

DeepAI

DeepAI

@DeepAI

8 Nov 2022

From the Machine Learning & Data Science glossary: Multilayer Perceptron deepai.org/machine-learning-… #ActivationFunction #MultilayerPerceptron

Multilayer Perceptron

What is a multilayer perceptron?

deepai.org

DeepAI

DeepAI

@DeepAI

27 Oct 2022

DeepAI Term of the Day: Vanishing Gradient Problem deepai.org/machine-learning-… #ActivationFunction #VanishingGradientProblem

Vanishing Gradient Problem

What is the vanishing gradient problem?

deepai.org