AI Research Wiki
Search
Search
Dark mode
Light mode
Explorer
Home
❯
papers
Folder: papers
21 items under this folder.
Apr 11, 2026
Attention Is All You Need
transformer
self-attention
machine-translation
architecture
Apr 11, 2026
Neural Machine Translation by Jointly Learning to Align and Translate
attention
machine-translation
encoder-decoder
Apr 11, 2026
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
pretraining
bidirectional
masked-language-model
fine-tuning
transformer
Apr 11, 2026
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
chain-of-thought
prompting
reasoning
in-context-learning
Apr 11, 2026
Chinchilla: Training Compute-Optimal Large Language Models
scaling-laws
compute-optimal
language-model
Apr 11, 2026
Constitutional AI: Harmlessness from AI Feedback
constitutional-ai
alignment
rlhf
ai-feedback
Apr 11, 2026
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
direct-preference-optimization
alignment
rlhf
preference-learning
Apr 11, 2026
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
attention
efficiency
gpu-optimization
systems
Apr 11, 2026
Improving Language Understanding by Generative Pre-Training
pretraining
fine-tuning
transfer-learning
language-model
transformer
Apr 11, 2026
Language Models are Unsupervised Multitask Learners
language-model
zero-shot
transfer-learning
scaling
transformer
Apr 11, 2026
Language Models are Few-Shot Learners
few-shot-learning
in-context-learning
language-model
scaling
Apr 11, 2026
GPT-4 Technical Report
multimodal
frontier-model
scaling
alignment
Apr 11, 2026
InstructGPT: Training Language Models to Follow Instructions with Human Feedback
rlhf
alignment
instruction-following
language-model
Apr 11, 2026
LLaMA: Open and Efficient Foundation Language Models
open-source
efficient-training
language-model
scaling-laws
Apr 11, 2026
LoRA: Low-Rank Adaptation of Large Language Models
fine-tuning
parameter-efficiency
low-rank-adaptation
Apr 11, 2026
RoFormer: Enhanced Transformer with Rotary Position Embedding
positional-encoding
self-attention
transformer
Apr 11, 2026
Scaling Laws for Neural Language Models
scaling-laws
language-modeling
compute-efficiency
Apr 11, 2026
Sequence to Sequence Learning with Neural Networks
sequence-to-sequence
machine-translation
encoder-decoder
lstm
Apr 11, 2026
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
mixture-of-experts
sparsity
scaling
efficiency
Apr 11, 2026
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
transfer-learning
encoder-decoder
pretraining
nlp-benchmark
Apr 11, 2026
Efficient Estimation of Word Representations in Vector Space
word-embeddings
representation-learning
nlp