AI Research Wiki

Tag: alignment

9 items with this tag.

  • Apr 11, 2026

    Constitutional AI

    • alignment
    • safety
    • rlaif
  • Apr 11, 2026

    Direct Preference Optimization

    • alignment
    • preference-learning
    • optimization
  • Apr 11, 2026

    Reinforcement Learning from Human Feedback

    • rlhf
    • alignment
    • reinforcement-learning
  • Apr 11, 2026

    Anthropic

    • organization
    • ai-lab
    • alignment
    • ai-safety
  • Apr 11, 2026

    OpenAI

    • organization
    • ai-lab
    • alignment
  • Apr 11, 2026

    Constitutional AI: Harmlessness from AI Feedback

    • constitutional-ai
    • alignment
    • rlhf
    • ai-feedback
  • Apr 11, 2026

    Direct Preference Optimization: Your Language Model is Secretly a Reward Model

    • direct-preference-optimization
    • alignment
    • rlhf
    • preference-learning
  • Apr 11, 2026

    GPT-4 Technical Report

    • multimodal
    • frontier-model
    • scaling
    • alignment
  • Apr 11, 2026

    InstructGPT: Training Language Models to Follow Instructions with Human Feedback

    • rlhf
    • alignment
    • instruction-following
    • language-model

Created with Quartz v4.5.2 © 2026

  • GitHub
  • Discord Community