-
Notifications
You must be signed in to change notification settings - Fork 403
Tweak historical scoring model PDF and default penalties #3495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tweak historical scoring model PDF and default penalties #3495
Conversation
TheBlueMatt
commented
Dec 19, 2024
8479eb7
to
f852dfa
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3495 +/- ##
==========================================
- Coverage 88.33% 88.29% -0.04%
==========================================
Files 149 149
Lines 112839 113000 +161
Branches 112839 113000 +161
==========================================
+ Hits 99672 99771 +99
- Misses 10688 10737 +49
- Partials 2479 2492 +13 ☔ View full report in Codecov by Sentry. |
} | ||
|
||
#[inline(always)] | ||
/// Identical to [`success_probability_float`] but returns integer numerator and denominators. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mhh, may make sense to use a macro to DRY this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me just clean it up by splitting out the calculations into their own functions we can call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -1766,7 +1805,7 @@ mod bucketed_history { | |||
// Because the first thing we do is check if `total_valid_points` is sufficient to consider | |||
// the data here at all, and can return early if it is not, we want this to go first to | |||
// avoid hitting a second cache line load entirely in that case. | |||
total_valid_points_tracked: u64, | |||
total_valid_points_tracked: f64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, isn't this just a count, why do we need an f64
here? If it's not a count anymore, should we rename the variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a count, but we ultimately use it as an f64 (dividing our probability-weights by it) so we do the int -> float conversion in the caching step instead of in the score calculation step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a comment in any case.
c9ae3e8
to
7c8fa41
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to squash the fixups. I of course didn't run any simulations so can't validate the new magic numbers here.
But, LGTM, I think.
lightning/src/routing/scoring.rs
Outdated
for (max_idx, max_bucket) in max_liquidity_offset_history_buckets.iter().enumerate() { | ||
if *max_bucket >= BUCKET_FIXED_POINT_ONE { | ||
highest_max_bucket_with_full_points = Some(cmp::max(highest_max_bucket_with_full_points.unwrap_or(0), max_idx)); | ||
} | ||
if *max_bucket != 0 { | ||
highest_max_bucket_with_points = cmp::max(highest_max_bucket_with_points, max_idx); | ||
} | ||
total_max_points += *max_bucket as u64; | ||
total_weight += (*max_bucket as u64) * (*max_bucket as u64) | ||
* (min_liquidity_offset_history_buckets[0] as u64) * (min_liquidity_offset_history_buckets[0] as u64); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we multiplying the 0
min-bucket here, and aren't just quadrupling the max_bucket
as elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm? Wdo we do max_bucket^4
? We should always be multiplying a min bucket by a max bucket and then squaring (though I'll update this to avoid the 3rd multiply). Here we're always using the 0th min bucket as because this loop is only doing the 0th min bucket and iterating the max-buckets.
7c8fa41
to
52e2b2e
Compare
Squashed with one less multiply
|
lightning/src/routing/scoring.rs
Outdated
@@ -1180,13 +1185,13 @@ fn success_probability( | |||
}; | |||
|
|||
if min_zero_implies_no_successes && min_liquidity_msat == 0 && | |||
denominator < u64::max_value() / 21 | |||
denominator < u64::max_value() / 78 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's been a while, can you add context around how you got to this threshold value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
experimentally determined :)
Utilizing the results of probes sent once a minute to a random node in the network for a random amount (within a reasonable range), we were able to analyze the accuracy of our resulting success probability estimation with various PDFs. For each candidate PDF (as well as other parameters, to be tuned in the coming commits), we used the `min_zero_implies_no_successes` fudge factor in `success_probability` as well as a total probability multiple fudge factor to get both the historical success model and the a priori model to be neither too optimistic nor too pessimistic (as measured by the relative log-loss between succeeding and failing hops in our sample data). We then compared the resulting log-loss for the historical success model and selected the candidate PDF with the lowest log-loss, skipping a few candidates with similar resulting log-loss but with more extreme constants (such as a power of 11 with a higher `min_zero_implies_no_successes` penalty). This resulted in a PDF of `128 * (1/256 + 9*(x - 0.5)^8)` with a `min_zero_implies_no_successes` probability multiplier of 64/78. Thanks to @twood22 for being a sounding board and helping analyze the resulting PDF.
Utilizing the results of probes sent once a minute to a random node in the network for a random amount (within a reasonable range), we were able to analyze the accuracy of our resulting success probability estimation with various PDFs across the historical and live-bounds models. For each candidate PDF (as well as other parameters, to be tuned in the coming commits), we used the `min_zero_implies_no_successes` fudge factor in `success_probability` as well as a total probability multiple fudge factor to get both the historical success model and the a priori model to be neither too optimistic nor too pessimistic (as measured by the relative log-loss between succeeding and failing hops in our sample data). We then compared the resulting log-loss for the historical success model and selected the candidate PDF with the lowest log-loss, skipping a few candidates with similar resulting log-loss but with more extreme constants (such as a power of 11 with a higher `min_zero_implies_no_successes` penalty). In every case, the historical model performed substantially better than the live-bounds model, so here we simply disable the live-bounds model by default and use only the historical model. Further, we use the calculated total probability multiple fudge factor (0.7886892844179266) to choose the ratio between the historical model and the per-hop penalty (as multiplying each hop's probability by 78% is equivalent to adding a per-hop penalty of log10(0.78) of our probabilistic penalty). We take this opportunity to bump the penalties up a bit as well, as anecdotally LDK users are willing to pay more than they do today to get more successful paths. Fixes lightningdevkit#3040
If the liquidity penalty multipliers in the scoring config are both 0 (as is now the default), the corresponding liquiditiy penalties will be 0. Thus, we should avoid doing the work to calculate them if we're ultimately just gonna get a value of zero anyway, which we do here.
Utilizing the results of probes sent once a minute to a random node in the network for a random amount (within a reasonable range), we were able to analyze the accuracy of our resulting success probability estimation with various PDFs across the historical and live-bounds models. For each candidate PDF (as well as other parameters, to be tuned in the coming commits), we used the `min_zero_implies_no_successes` fudge factor in `success_probability` as well as a total probability multiple fudge factor to get both the historical success model and the a priori model to be neither too optimistic nor too pessimistic (as measured by the relative log-loss between succeeding and failing hops in our sample data). Across the simulation runs, for a given PDF and other parameters, we nearly always did better with a shorter half-life (even as short as 1ms, i.e. only learning per-probe rather than across probes). While this likely makes sense for nodes which do live probing, not all nodes do, and thus we should avoid over-biasing on the dataset we have. While it may make sense to only learn per-payment and not across payments, I can't fully rationalize this result and thus want to avoid over-tuning, so here we reduce the half-life from 6 hours to 30 minutes.
52e2b2e
to
0aae3e4
Compare
In the next commit we'll want to return floats or ints from `success_probability` depending on the callsite, so instead of duplicating the calculation logic, here we split the linear (which always uses int math) and nonlinear (which always uses float math) into separate methods, allowing us to write trivial `success_probability` wrappers that return the desired type.
Utilizing the results of probes sent once a minute to a random node in the network for a random amount (within a reasonable range), we were able to analyze the accuracy of our resulting success probability estimation with various PDFs across the historical and live-bounds models. For each candidate PDF (as well as other parameters, including the histogram bucket weight), we used the `min_zero_implies_no_successes` fudge factor in `success_probability` as well as a total probability multiple fudge factor to get both the historical success model and the a priori model to be neither too optimistic nor too pessimistic (as measured by the relative log-loss between succeeding and failing hops in our sample data). We then compared the resulting log-loss for the historical success model and selected the candidate PDF with the lowest log-loss, skipping a few candidates with similar resulting log-loss but with more extreme constants (such as a power of 11 with a higher `min_zero_implies_no_successes` penalty). Somewhat surprisingly (to me at least), the (fairly strongly) preferred model was one where the bucket weights in the historical histograms are exponentiated. In the current design, the weights are effectively squared as we multiply the minimum- and maximum- histogram buckets together before adding the weight*probabilities together. Here we multiply the weights yet again before addition. While the simulation runs seemed to prefer a slightly stronger weight than the 4th power we do here, the difference wasn't substantial (log-loss 0.5058 to 0.4941), so we do the simpler single extra multiply here. Note that if we did this naively we'd run out of bits in our arithmetic operations - we have 16-bit buckets, which when raised to the 4th can fully fill a 64-bit int. Additionally, when looking at the 0th min-bucket we occasionally add up to 32 weights together before multiplying by the probability, requiring an additional five bits. Instead, we move to using floats during our histogram walks, which further avoids some float -> int conversions because it allows for retaining the floats we're already using to calculate probability. Across the last handful of commits, the increased pessimism more than makes up for the increased runtime complexity, leading to a 40-45% pathfinding speedup on a Xeon Silver 4116 and a 25-45% speedup on a Xeon E5-2687W v3. Thanks to @twood22 for being a sounding board and helping analyze the resulting PDF.
0aae3e4
to
adb0afc
Compare
Squash-pushed a cleanup of the multiply constant, DRYing it up and adding docs.
|