连续空间的递归最小二乘行动者—评论家算法

2 RLSAC 算法

Policy Gradient Methods for Reinforcement Learning with Function SMSM-NIPS99.pdf

此文是前面看的几篇的基础
** 2 Policy Gradient with Approximation**

Theorem 2 (Policy Gradient with Function Approximation).

3 Application to Deriving Algorithms and Advantages
7p
the advantage function
在综述中描述不清，这里解释比较通顺。The choice of v does not affect any of our theorems, but can substantially affect the variance of the gradient estimators. baseline的问题

4 Convergence of Policy Iteration with Function Approximation

最后编辑于：2017.12.04 01:07:44

连续空间的递归最小二乘行动者—评论家算法

Policy Gradient Methods for Reinforcement Learning with Function SMSM-NIPS99.pdf

推荐阅读更多精彩内容