Lecture 4 on Calculus of Variations
You might wonder...If I’m optimizing a shape...a curve, a surface, a whole path, what does "take the derivative and set it to zero" even mean? Do I take the damn derivative with respect to a curve/surface? 🤔
In normal calculus the variable is a number x, so the reflex is clean...f′(x)=0. In calculus of variations the variable is a whole function...the geometry itself, like a curve y(x) (or a surface z(x,y)). So the derivative can’t be a single slope. It has to be a pointwise sensitivity, i.e. how the objective reacts to tiny local deformations.
You’re holding a whole shape, like a curve y(x). Your objective isn’t f(x) anymore, it’s a functional J[y], and as we've seen with our first there examples, usually an integral that depends on the entire curve (often through y and y’).
To talk about a “derivative”, you do the only thing that makes sense: you nudge the entire curve by a tiny amount and see how J changes. Pick a wiggle shape η(x). It’s not random...it’s any admissible deformation direction.
Admissible just means it obeys the constraints. If the endpoints are fixed, you force η(0)=η(1)=0 so the wiggle doesn’t move the endpoints. Then scale that wiggle by a small number ε and define the perturbed curve yε(x)=y(x) εη(x).
Now treat ε like the usual scalar in a Taylor expansion. As ε→0, J[y εη] expands as
J[y εη] = J[y] ε · (first-order term depending linearly on η) o(ε).
So the difference is
J[y εη] - J[y] = ε · (linear functional of η) o(ε).
For the standard integral of a Lagrangian problems, that linear functional can be written as an inner product with some function of x:
J[y εη] - J[y] = ε ∫ (δJ/δy)(x) η(x) dx o(ε).
That’s the definition-level meaning of δJ/δy: it’s the unique pointwise sensitivity function that makes this identity true for every admissible η. If δJ/δy is positive at some x, then choosing η negative there decreases J; if δJ/δy is negative there, pushing y upward locally decreases J. It’s literally a map along the curve saying push this way to go downhill.
Now translate “set the derivative to zero.”
At a minimizer y*, the first-order change must vanish for every admissible wiggle:
J[y* εη] − J[y*] = o(ε) for all η.
Plug in the expansion and the ε-term must be zero:
∫ (δJ/δy)(x) η(x) dx = 0 for all admissible η.
Here’s the crucial logic step: the only way an integral against every test function η can be zero is if the integrand itself is zero (in the usual sense used in analysis). So you get
δJ/δy = 0.
For the common case J[y]=∫ L(x, y, y’) dx, you can compute δJ/δy explicitly and it becomes the Euler–Lagrange expression
δJ/δy = ∂L/∂y − d/dx(∂L/∂y’).
So if you name the Euler–Lagrange residual as “left-hand side”
R(x) = ∂L/∂y − d/dx(∂L/∂y’),
then “set the derivative to zero” is exactly R(x)=0.
That’s why animation works so well. You don’t have to solve R=0 in one shot. You can evolve the curve in an artificial time τ by pushing it in the downhill direction:
∂y/∂τ = −R(y).
Where the residual is large, the curve moves a lot; as the residual drains toward zero, the motion dies out and the curve settles into an extremal.
In our animations, we start from an intentionally ugly curve/surface. Frame by frame the functional drops, the residual drains away, and the geometry relaxes into an extremal.
#CalculusOfVariations #EulerLagrange #FunctionalDerivative #GradientFlow #Optimization #MathAnimation