Personalized Soups: LLM Alignment Via Parameter Merging - Personalized Human Feedback
Too Long; Didn't Read
This paper introduces RLPHF, which aligns large language models with personalized human preferences via multi-objective RL and parameter merging.