Imagine "untraining" a Large Language Model on specific content.
The New York Times doesn't want its content in ChatGPT. OpenAI would have to retrain its models from scratch. Removing any content from their models would cost them millions of dollars.
I just read a new paper from Microsoft Research that tries to fix this. This is the only study I've found so far that's looking into making models forget.
The paper proposes a process that makes a model forget about Harry Potter without retraining it from scratch.
It's a proof of concept. The researchers aren't sure whether their solution generalizes to other topics. It's a first step, but it's cool nonetheless.
What they did is kind of a hack, but I like it:
They fine-tuned the model using a dataset containing the original Harry Potter text as the input tokens and some generic labels as targets. They pre-generate these generic labels. For example, instead of using "Harry," they use "Jack," and instead of using "Hermione," they use "her."
In other words, they don't actually delete knowledge from the model. They overwrite it.
As we start using these models everywhere, the ability to forget information will become critical.
Let's see how much progress we make this year on this.
You'll find a link to the paper in the image ALT.
ALT https://arxiv.org/pdf/2310.02238.pdf