Back to Glossary

Reinforcement Learning through Human Feedback

Reinforcement Learning from Human Feedback (RLHF), including reinforcement learning from human preferences, is a technique that trains a "reward model" directly from human feedback and uses the model as a reward function to optimize an agent's policy using reinforcement learning (RL) through an optimization algorithm.

share this

See the industry-leading AI governance platform in action

Schedule a call with one of our experts

Get a demo