🎯 Tired of one-size-fits-all AI chatter? ChatGPT tends to generate verbose & overly informative responses. This is because the current RLHF pipeline only allows aligning LLMs to the general preferences of the population. However, in the real world, people may have multiple, conflicting preferences (e.g., Friendliness vs Formal, Informative vs Conciseness).
✨ To model each personalized preference without conflict, we convert the current alignment problem into a Multi-Objective RL problem and introduce Personalized Soups🍜.
With Personalized Soups🍜, you could align your LLM to your preference by simply merging the parameters optimized on single objectives (preferences) on the fly!
paper:
arxiv.org/abs/2310.11564