AI Research Wiki
Search
Search
Dark mode
Light mode
Explorer
Tag: direct-preference-optimization
1 item with this tag.
Apr 11, 2026
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
direct-preference-optimization
alignment
rlhf
preference-learning