Reinforcement Learning

定义 Definition

强化学习：一种机器学习方法，让“智能体（agent）”通过与环境互动、根据奖励（reward）反馈来学习策略（policy），以最大化长期累计回报。常用于游戏对弈、机器人控制、推荐与资源调度等。也可指更广义的“通过奖惩来学习”的机制。

发音 Pronunciation (IPA)

/rinfrsmnt ln/

例句 Examples

Reinforcement learning helps a robot learn to walk by trial and error.
强化学习可以让机器人通过不断试错学会走路。

In many real-world tasks, reinforcement learning must balance exploration and exploitation to achieve stable long-term performance.
在许多真实任务中，强化学习必须在“探索”和“利用”之间取得平衡，才能获得稳定的长期表现。

词源 Etymology

reinforcement 来自 reinforce（加强、强化），核心含义是“通过加强作用让某种行为更可能再次发生”；在心理学中常指“强化”（用奖励/惩罚改变行为概率）。learning 表示“学习”。合起来，“reinforcement learning”强调：学习过程由奖励信号“强化”引导，而非直接给出标准答案。

文学与名著用例 Literary Works

Reinforcement Learning: An Introduction（Sutton & Barto）：强化学习领域最经典的教材之一，系统介绍价值函数、策略梯度、时序差分等概念。
“Mastering the game of Go with deep neural networks and tree search”（Silver et al., Nature, 2016）：AlphaGo 相关论文，强化学习在复杂博弈中的代表性应用。
“Learning from delayed rewards”（Watkins, 1989）：与 Q-learning 早期思想密切相关的经典研究工作。
Artificial Intelligence: A Modern Approach（Russell & Norvig）：人工智能权威教材，对强化学习及其与规划、决策的关系有重要介绍。