Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding time dependence to reward function #14

Closed
AriMirsky opened this issue Jan 30, 2023 · 0 comments · Fixed by #29
Closed

Adding time dependence to reward function #14

AriMirsky opened this issue Jan 30, 2023 · 0 comments · Fixed by #29

Comments

@AriMirsky
Copy link
Collaborator

When we use the qlearning algorithm on the physical bike, the environment will be changing over time. There are two ways to deal with this:

  1. Over the course of updating the values in the q-matrix, use the most recent reward function available. This makes the most intuitive sense, but is not guaranteed to converge to a possible physical path.
  2. Add another dimension to the state, namely time. Explicitly estimate the reward function in the future, using that to fill in future values, before we have more knowledge about them. The problem with this is that some of the benefits of q-learning come with certain states leading to others being optimal, eventually leading back to themselves in a loop. This solution gets rid of all loops between states because you can't travel backwards (or sideways) in time.

Because it is unclear (at least to me) which solution is more promising, it would be nice to be able to easily toggle between the two methods.

@AriMirsky AriMirsky linked a pull request Oct 5, 2023 that will close this issue
@cm823 cm823 closed this as completed in #29 Nov 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant