Gradient descent like teaching a child how to bake cookies with just the right sweetness.
If she mixes sugar into the dough and takes a bite to see it's too sweet, next time, she uses less sugar.
If she takes another bite, and it's still sweet but closer, she'll take note of that too.
She'll keep adjusting bit by bit until the taste is just right.
That’s literally what gradient descent does with numbers.
Let me take an example that's close to how the maths works to let you at least understand the idea of it:
Let's say we want an AI to predict house prices. We already know the original price of the houses around the location, but we're just testing to see the accuracy of the AI.
Suppose:
House size = 100
True price = 200
Our weight = 1
The formula that would be used is:
The AI prediction = weight × size
Therefore, Prediction = 100 × 1 = 100
But the true price from the example is 200, which means we’re off by 100, that’s our error.
To fix it, gradient descent says:
New weight = Old weight - LearningRate × times ErrorDirection
Now, the error direction is simply the slope, which just tells us which way to move:
- If we predict too low, the slope is negative, so we go up.
- If we predict too high, the slope is positive, so we go down.
Now, back to our numbers:
New weight = Old weight - LearningRate × times ErrorDirection
(Learning rate is either 0.1 or 0.01 depending on what you decide to use).
Therefore, Newweight = 1 - (0.1 × -2)
= 1 - (-0.2) = 1.2
So weight goes from 1 to 1.2, but if we multiply 1.2 with the prediction from the AI (100), we still won't get the original price of the house (200).
So we'll, keep increasing our weight little by little till we get to 2.0, which when multiplied by 100 will give us the original price.
Now, Prediction = 100 × 2 = 200 Perfect...
That’s just how gradient descent works, it makes a guess, checks, adjust, and repeats till it's right.
It’s how AI learns almost everything.
Your AI Princess,
~Kaeyra 🩶~
AI doesn’t wake up one day knowing how to write, draw, or recognize faces.
It learns the hard way, by making mistakes, correcting them, and slowly getting better.
And this is achieved through what we call the gradient descent. It’s the quiet process that turns clumsy guesses into smart predictions.
What Gradient Descent Means?
Gradient descent is an iterative optimization algorithm that minimizes a differentiable function by repeatedly taking steps in the opposite direction of the function's gradient.
When an AI model starts learning, it doesn’t know the “best answers.”
It just makes random guesses. Then it checks how wrong it was (this is the error).
Gradient descent is the strategy it uses to reduce that error step by step.
Over time, with enough steps, it reaches the best possible answer.
Key Components Of Gradient Descent
→ Cost Function: A function that measures the error between the model's predictions and the actual results.
→ Gradient: A vector of partial derivatives that points in the direction of the function's steepest ascent.
→ Learning Rate: A hyperparameter that controls the size of each step taken during the descent. A high learning rate can cause overshooting the minimum, while a low learning rate can lead to slow convergence.
→ Iterations: The number of times the algorithm repeats the process of calculating the gradient and updating the parameters.
Why It is Important?
→ Model Training: It's the core algorithm for training many machine learning models, allowing them to learn from data by adjusting their internal parameters.
→ Finding Optimal Solutions: By minimizing the loss function, gradient descent helps find the set of model parameters that best fit the training data, leading to more accurate predictions.
→ Scalability: Variants like Stochastic Gradient Descent (SGD) and Mini-Batch Gradient Descent (MBGD) are used to handle large datasets more efficiently by processing data in smaller chunks, improving computational performance.
How The Maths Works
→ Initialization: The algorithm starts with initial, often random, values for the model's parameters (e.g., weights and biases).
→ Calculate Gradient: The gradient of the cost function is calculated with respect to the current parameters. This gradient indicates the direction of the steepest increase of the cost function.
→ Update Parameters: The parameters are updated by taking a small step in the opposite direction of the gradient, multiplied by a learning rate. This is the "downhill" direction, aimed at reaching a lower cost.
→ Repeat: Steps 2 and 3 are repeated for a set number of iterations or until the change in parameters becomes negligible, signaling that a minimum (local or global) has been found.
To put this in plain terms;
- the AI first makes a prediction, it then compares with the real answer (measuring the error (loss)).
- gradient descent then asks: “which direction reduces this error the most?”, and the model shifts its internal settings slightly.
- the whole process repeats again, and again, until the model nails it.
Gradient descent is basically trial-and-error with a sense of direction.
Instead of stumbling aimlessly, AI uses it to climb down errors step by step, until it’s smart enough to do the things we rely on today.
I hope today's episode was insightful!
Don't forget to like and share...
Your AI Princess
~Kaeyra 🩶~