Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR implements two large changes to our midpoint calculation of cuts. In using
Hmisc::cut2()
there are some occasions where a single cut is defined by the same lower and upper bound, i.e., the bucket is represented by one value. In an effort to combat this behavior, we previously found all buckets with one value and assigned them to another bucket. This PR reverts that behavior and treats each bucket as its own, no longer combining buckets together. Furthermore, we simplify the calculation to find the midpoints of our cuts. SinceHmisc::cut2()
returns each cut as a factor, we use string processing to extract the lower and upper bounds and then compute the midpoint of the cut. In some cases, the cuts are not formatted uniformly so we take care to coerce the cuts to the same structure before determining the midpoints. The midpoint for a cut representing a single value is the value itself.Closes #18.