169 reads

ICPL Baseline Methods: Disagreement Sampling and PrefPPO for Reward Learning

by
December 3rd, 2024
featured image - ICPL Baseline Methods: Disagreement Sampling and PrefPPO for Reward Learning

About Author

Language Models (dot tech) HackerNoon profile picture

Large Language Models (LLMs) ushered in a technological revolution. We breakdown how the most important models work.

Comments

avatar

TOPICS

Related Stories