Saptarshi Sarkar

Saptarshi Sarkar

52 Photos and videos

Tweets

Saptarshi Sarkar @SSarkar2007

Jun 10

You train a model on MNIST. 99.2% accuracy. You fine-tune it on Fashion MNIST. 91.1%. You go back and test on MNIST. 33.9%. This is catastrophic forgetting and here’s what’s actually happening under the hood 🧵

ALT Drop in accuracy on MNIST digits dataset after re-training the convolutional neutral network model on Fashion MNIST dataset

more replies

Saptarshi Sarkar

Saptarshi Sarkar @SSarkar2007

Jun 10

The Colab notebook is open — you can watch the forgetting happen in real time: colab.research.google.com/gi… Full article: saptarshisarkar.hashnode.dev… #MachineLearning #ContinualLearning #AI

Google Colab Notebook

Run, share, and edit Python notebooks

colab.research.google.com

Saptarshi Sarkar

Saptarshi Sarkar @SSarkar2007

Jun 11

Also, on @ThePracticalDev (Dev. to): dev.to/saptarshisarkar/the-a… @GoogleColab @hashnode

The Anatomy of Catastrophic Forgetting

We train a model on handwritten digit classification. 99% accuracy. Then we train the same model on a...

dev.to

Saptarshi Sarkar

Saptarshi Sarkar @SSarkar2007

Jun 11

Think of it as two parabolic bowls in weight space — one minimum for Task A, another for Task B. When you optimize for B, the weights physically move away from A’s minimum. There’s no single θ that satisfies both.

Diagram showing weight space for minimising loss on MNIST digits (task A) and Fashion MNIST dataset (task B) and the usual training trajectory on task B with no awareness of task A

ALT Diagram showing weight space for minimising loss on MNIST digits (task A) and Fashion MNIST dataset (task B) and the usual training trajectory on task B with no awareness of task A

more replies

Saptarshi Sarkar

Saptarshi Sarkar @SSarkar2007

Jun 11

“Why not just retrain on all the data every time?” — real blockers: • GDPR/HIPAA: old data often must be deleted • Storage: modern datasets are huge • Streaming: some data only exists once • Cost: retraining GPT-scale weekly is not a plan

Saptarshi Sarkar

Saptarshi Sarkar @SSarkar2007

Jun 11

The @GoogleColab notebook is open — you can watch the forgetting happen in real time: colab.research.google.com/gi… Full article: saptarshisarkar.hashnode.dev… #MachineLearning #ContinualLearning #AI

Google Colab Notebook

Run, share, and edit Python notebooks

colab.research.google.com

Saptarshi Sarkar

Saptarshi Sarkar @SSarkar2007

Jun 11

You train a model on MNIST. 99.2% accuracy. You fine-tune it on Fashion MNIST. 91.1%. You go back and test on MNIST. 33.9%. This is catastrophic forgetting — and here’s what’s actually happening under the hood 🧵

ALT Drop in accuracy on MNIST digits dataset after re-training the convolutional neutral network model on Fashion MNIST dataset

2,618

Saptarshi Sarkar

Saptarshi Sarkar @SSarkar2007

Jun 11

The root cause: every parameter is shared between tasks. Gradient descent is stateless — it only follows the current loss signal. It has zero awareness that Task A ever existed.