Skip to content

Actor critic example not using discount rate properly #744

@rodrigodesalvobraz

Description

@rodrigodesalvobraz

The Actor Critic example (which is actually an implementation of REINFORCE-with-baseline as pointed out in #573), does not use the discount rate properly.

The loss should include \gamma ^ t, as shown in the box on page 330 of Sutton & Barto:

image

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions