A passionate Software Developer and an open-source enthusiast.

Joined May 2022
52 Photos and videos
You train a model on MNIST. 99.2% accuracy. You fine-tune it on Fashion MNIST. 91.1%. You go back and test on MNIST. 33.9%. This is catastrophic forgetting and here’s what’s actually happening under the hood 🧵
1
1
7
Think of it as two parabolic bowls in weight space — one minimum for Task A, another for Task B. When you optimize for B, the weights physically move away from A’s minimum. There’s no single θ that satisfies both.
1
1
5
“Why not just retrain on all the data every time?” — real blockers: • GDPR/HIPAA: old data often must be deleted • Storage: modern datasets are huge • Streaming: some data only exists once • Cost: retraining GPT-scale weekly is not a plan
1
1
2
You train a model on MNIST. 99.2% accuracy. You fine-tune it on Fashion MNIST. 91.1%. You go back and test on MNIST. 33.9%. This is catastrophic forgetting — and here’s what’s actually happening under the hood 🧵
1
1
2,618
The root cause: every parameter is shared between tasks. Gradient descent is stateless — it only follows the current loss signal. It has zero awareness that Task A ever existed.
1
1
6
You train a model on MNIST. 99.2% accuracy. You fine-tune it on Fashion MNIST. 91.1%. You go back and test on MNIST. 33.9%. This is catastrophic forgetting — and here’s what’s actually happening under the hood 🧵
1
1
1,010
“Why not just retrain on all the data every time?” — real blockers: • GDPR/HIPAA: old data often must be deleted • Storage: modern datasets are huge • Streaming: some data only exists once • Cost: retraining GPT-scale weekly is not a plan
1
1
1
You train a model on MNIST. 99.2% accuracy. You fine-tune it on Fashion MNIST. 91.1%. You go back and test on MNIST. 33.9%. This is catastrophic forgetting — and here’s what’s actually happening under the hood 🧵
1
1
66
“Why not just retrain on all the data every time?” — real blockers: • GDPR/HIPAA: old data often must be deleted • Storage: modern datasets are huge • Streaming: some data only exists once • Cost: retraining GPT-scale weekly is not a plan
1
1
4