šØ New Paper Alert! šØ
How can we align language models without drowning in prompt engineering or falling into reward hacking traps?
We introduceĀ Meta Policy Optimization (MPO)āa new reinforcement learning framework thatĀ evolves its own reward model rubricsĀ through meta-level reflection. Inspired by metacognition and evaluative thinking, MPO trains models toĀ think about how they evaluate, not justĀ whatĀ they generate.
š„ Why it matters:
āļø Boosts stability and robustness in RLAIF
āļø Reduces human labor in prompt crafting
āļø Generalizes across tasks: essays, summarization, ethical and mathematical reasoning
Check it out: huggingface.co/papers/2504.2ā¦
Big thanks to co-authors @chanwoopark20 (MIT), @_vipulraheja (Grammarly), and @dongyeopkang (UMN)!
#AI#LLMs#ReinforcementLearning#MetaLearning#NLP#Alignment#RLHF#RLAIF#EvaluativeThinking#PromptEngineering
Join us for MEA's Learning Session#2023/13 titled 'Evaluative Thinking' by Dr. Tom Grayson, former President of American Evaluation Association & moderated by Suzanna.
ā°21:00-22:30 Thursday Dec 21, 2023 (ULAT)
Register here:Ā zoom.us/meeting/register/tJAā¦#evaluativethinking@aeaweb
Join us for MEA's Learning Session#2023/13 titled 'Evaluative Thinking' by Dr. Tom Grayson, former President of American Evaluation Association & moderated by Suzanna.
ā°21:00-22:30 Thursday Dec 21, 2023 (ULAT)
Register here:Ā zoom.us/meeting/register/tJAā¦#evaluativethinking@aeaweb
1/5 Hot off the presses! An exciting arrival to my doorstep today: Practical Wisdom for an Ethical Evaluation Practice. With a chapter from @tgarchibald and I on the relationship between #reflectivepractice, #evaluativethinking, and #practicalwisdom. Super proud of this work.
Graphic summary of #AsianEvaluationWeek22 session on #evaluation influence and utilization. Publishing a report is not the end but begging of a journey toward changing behavior with evaluative thinking.
So many different approaches, choice on #context
Bang on about #EvaluativeThinking; donāt forget about designing interventions and integrating these spaces!
Barrier to entry is language and terminology - work on definitions. Text processing and adding to #evaluations!
#aes22ADL
This is an interesting analogy for the use of results chains ( or impact ladders ). If you make the levels too far apart you lose people. #evaluativethinking