Skip to content

为什么新的版本在k epochs更新时不重新计算advantages? #68

@31CFDC30

Description

@31CFDC30

我记得在之前的版本中advantages = td_target - state_values,td_target使用reward计算,而state_values使用迭代后的policy进行估计。

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions