Make feedback more flexible and light #337
Labels
area/routing
External API
lifecycle/stale
Denotes an issue or PR has remained open with no activity and has become stale.
Reinforcement Learning
Milestone
Currently the send-feedback API takes in a batch of samples (request+response) but only one float for the reward which means that design choices must be made when distributing this reward between all samples in the batch (e.g. #336). One potential fix would be to change reward to be of type DefaultData so that each sample in a batch gets its own reward.
Additionally, we need to re-evaluate sending both request and response as part of the feedback as this could be wasteful with large payloads.
The text was updated successfully, but these errors were encountered: