Tweak historical scoring model PDF and default penalties #3495

TheBlueMatt · 2024-12-19T03:00:10Z

Utilizing the results of probes sent once a minute to a random node
in the network for a random amount (within a reasonable range), we
were able to analyze the accuracy of our resulting success
probability estimation with various PDFs across the historical and
live-bounds models.

For each candidate PDF (as well as other parameters, including the
histogram bucket weight), we used the
`min_zero_implies_no_successes` fudge factor in
`success_probability` as well as a total probability multiple fudge
factor to get both the historical success model and the a priori
model to be neither too optimistic nor too pessimistic (as measured
by the relative log-loss between succeeding and failing hops in our
sample data).

We then compared the resulting log-loss for the historical success
model and selected the candidate PDF with the lowest log-loss,
skipping a few candidates with similar resulting log-loss but with
more extreme constants (such as a power of 11 with a higher
`min_zero_implies_no_successes` penalty).

lightning/src/routing/scoring.rs

codecov · 2024-12-19T04:03:52Z

Codecov Report

Attention: Patch coverage is 97.72727% with 4 lines in your changes missing coverage. Please review.

Project coverage is 88.29%. Comparing base (42cc4e7) to head (7c8fa41).
Report is 5 commits behind head on main.

Files with missing lines	Patch %	Lines
lightning/src/routing/scoring.rs	97.72%	1 Missing and 3 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3495      +/-   ##
==========================================
- Coverage   88.33%   88.29%   -0.04%     
==========================================
  Files         149      149              
  Lines      112839   113000     +161     
  Branches   112839   113000     +161     
==========================================
+ Hits        99672    99771      +99     
- Misses      10688    10737      +49     
- Partials     2479     2492      +13

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

lightning/src/routing/scoring.rs

tnull · 2024-12-19T17:27:44Z

lightning/src/routing/scoring.rs

+}
+
+#[inline(always)]
+/// Identical to [`success_probability_float`] but returns integer numerator and denominators.


Mhh, may make sense to use a macro to DRY this?

Let me just clean it up by splitting out the calculations into their own functions we can call.

tnull · 2024-12-19T17:28:30Z

lightning/src/routing/scoring.rs

@@ -1766,7 +1805,7 @@ mod bucketed_history {
 		// Because the first thing we do is check if `total_valid_points` is sufficient to consider
 		// the data here at all, and can return early if it is not, we want this to go first to
 		// avoid hitting a second cache line load entirely in that case.
-		total_valid_points_tracked: u64,
+		total_valid_points_tracked: f64,


Huh, isn't this just a count, why do we need an f64 here? If it's not a count anymore, should we rename the variable?

It is a count, but we ultimately use it as an f64 (dividing our probability-weights by it) so we do the int -> float conversion in the caching step instead of in the score calculation step.

I'll add a comment in any case.

tnull

Feel free to squash the fixups. I of course didn't run any simulations so can't validate the new magic numbers here.

But, LGTM, I think.

tnull · 2024-12-19T19:24:38Z

lightning/src/routing/scoring.rs

 				for (max_idx, max_bucket) in max_liquidity_offset_history_buckets.iter().enumerate() {
 					if *max_bucket >= BUCKET_FIXED_POINT_ONE {
 						highest_max_bucket_with_full_points = Some(cmp::max(highest_max_bucket_with_full_points.unwrap_or(0), max_idx));
 					}
 					if *max_bucket != 0 {
 						highest_max_bucket_with_points = cmp::max(highest_max_bucket_with_points, max_idx);
 					}
-					total_max_points += *max_bucket as u64;
+					total_weight += (*max_bucket as u64) * (*max_bucket as u64)
+						* (min_liquidity_offset_history_buckets[0] as u64) * (min_liquidity_offset_history_buckets[0] as u64);


Why are we multiplying the 0 min-bucket here, and aren't just quadrupling the max_bucket as elsewhere?

Hmm? Wdo we do max_bucket^4? We should always be multiplying a min bucket by a max bucket and then squaring (though I'll update this to avoid the 3rd multiply). Here we're always using the 0th min bucket as because this loop is only doing the 0th min bucket and iterating the max-buckets.

TheBlueMatt · 2024-12-19T19:46:55Z

Squashed with one less multiply

 git diff-tree -U1 7c8fa41e9 52e2b2ea9
diff --git a/lightning/src/routing/scoring.rs b/lightning/src/routing/scoring.rs
index deff7feb3..94e16b96b 100644
--- a/lightning/src/routing/scoring.rs
+++ b/lightning/src/routing/scoring.rs
@@ -2006,4 +2006,4 @@ mod bucketed_history {
 					// `recalculate_valid_point_count`.
-					total_weight += (*max_bucket as u64) * (*max_bucket as u64)
-						* (min_liquidity_offset_history_buckets[0] as u64) * (min_liquidity_offset_history_buckets[0] as u64);
+					let bucket_weight = (*max_bucket as u64) * (min_liquidity_offset_history_buckets[0] as u64);
+					total_weight += bucket_weight * bucket_weight;
 				}

arik-so · 2024-12-19T19:20:47Z

lightning/src/routing/scoring.rs

@@ -1180,13 +1185,13 @@ fn success_probability(
 		};

 	if min_zero_implies_no_successes && min_liquidity_msat == 0 &&
-		denominator < u64::max_value() / 21
+		denominator < u64::max_value() / 78


it's been a while, can you add context around how you got to this threshold value?

experimentally determined :)

lightning/src/routing/scoring.rs

@twood22

Utilizing the results of probes sent once a minute to a random node in the network for a random amount (within a reasonable range), we were able to analyze the accuracy of our resulting success probability estimation with various PDFs. For each candidate PDF (as well as other parameters, to be tuned in the coming commits), we used the `min_zero_implies_no_successes` fudge factor in `success_probability` as well as a total probability multiple fudge factor to get both the historical success model and the a priori model to be neither too optimistic nor too pessimistic (as measured by the relative log-loss between succeeding and failing hops in our sample data). We then compared the resulting log-loss for the historical success model and selected the candidate PDF with the lowest log-loss, skipping a few candidates with similar resulting log-loss but with more extreme constants (such as a power of 11 with a higher `min_zero_implies_no_successes` penalty). This resulted in a PDF of `128 * (1/256 + 9*(x - 0.5)^8)` with a `min_zero_implies_no_successes` probability multiplier of 64/78. Thanks to @twood22 for being a sounding board and helping analyze the resulting PDF.

Utilizing the results of probes sent once a minute to a random node in the network for a random amount (within a reasonable range), we were able to analyze the accuracy of our resulting success probability estimation with various PDFs across the historical and live-bounds models. For each candidate PDF (as well as other parameters, to be tuned in the coming commits), we used the `min_zero_implies_no_successes` fudge factor in `success_probability` as well as a total probability multiple fudge factor to get both the historical success model and the a priori model to be neither too optimistic nor too pessimistic (as measured by the relative log-loss between succeeding and failing hops in our sample data). We then compared the resulting log-loss for the historical success model and selected the candidate PDF with the lowest log-loss, skipping a few candidates with similar resulting log-loss but with more extreme constants (such as a power of 11 with a higher `min_zero_implies_no_successes` penalty). In every case, the historical model performed substantially better than the live-bounds model, so here we simply disable the live-bounds model by default and use only the historical model. Further, we use the calculated total probability multiple fudge factor (0.7886892844179266) to choose the ratio between the historical model and the per-hop penalty (as multiplying each hop's probability by 78% is equivalent to adding a per-hop penalty of log10(0.78) of our probabilistic penalty). We take this opportunity to bump the penalties up a bit as well, as anecdotally LDK users are willing to pay more than they do today to get more successful paths. Fixes lightningdevkit#3040

If the liquidity penalty multipliers in the scoring config are both 0 (as is now the default), the corresponding liquiditiy penalties will be 0. Thus, we should avoid doing the work to calculate them if we're ultimately just gonna get a value of zero anyway, which we do here.

Utilizing the results of probes sent once a minute to a random node in the network for a random amount (within a reasonable range), we were able to analyze the accuracy of our resulting success probability estimation with various PDFs across the historical and live-bounds models. For each candidate PDF (as well as other parameters, to be tuned in the coming commits), we used the `min_zero_implies_no_successes` fudge factor in `success_probability` as well as a total probability multiple fudge factor to get both the historical success model and the a priori model to be neither too optimistic nor too pessimistic (as measured by the relative log-loss between succeeding and failing hops in our sample data). Across the simulation runs, for a given PDF and other parameters, we nearly always did better with a shorter half-life (even as short as 1ms, i.e. only learning per-probe rather than across probes). While this likely makes sense for nodes which do live probing, not all nodes do, and thus we should avoid over-biasing on the dataset we have. While it may make sense to only learn per-payment and not across payments, I can't fully rationalize this result and thus want to avoid over-tuning, so here we reduce the half-life from 6 hours to 30 minutes.

In the next commit we'll want to return floats or ints from `success_probability` depending on the callsite, so instead of duplicating the calculation logic, here we split the linear (which always uses int math) and nonlinear (which always uses float math) into separate methods, allowing us to write trivial `success_probability` wrappers that return the desired type.

@twood22

Utilizing the results of probes sent once a minute to a random node in the network for a random amount (within a reasonable range), we were able to analyze the accuracy of our resulting success probability estimation with various PDFs across the historical and live-bounds models. For each candidate PDF (as well as other parameters, including the histogram bucket weight), we used the `min_zero_implies_no_successes` fudge factor in `success_probability` as well as a total probability multiple fudge factor to get both the historical success model and the a priori model to be neither too optimistic nor too pessimistic (as measured by the relative log-loss between succeeding and failing hops in our sample data). We then compared the resulting log-loss for the historical success model and selected the candidate PDF with the lowest log-loss, skipping a few candidates with similar resulting log-loss but with more extreme constants (such as a power of 11 with a higher `min_zero_implies_no_successes` penalty). Somewhat surprisingly (to me at least), the (fairly strongly) preferred model was one where the bucket weights in the historical histograms are exponentiated. In the current design, the weights are effectively squared as we multiply the minimum- and maximum- histogram buckets together before adding the weight*probabilities together. Here we multiply the weights yet again before addition. While the simulation runs seemed to prefer a slightly stronger weight than the 4th power we do here, the difference wasn't substantial (log-loss 0.5058 to 0.4941), so we do the simpler single extra multiply here. Note that if we did this naively we'd run out of bits in our arithmetic operations - we have 16-bit buckets, which when raised to the 4th can fully fill a 64-bit int. Additionally, when looking at the 0th min-bucket we occasionally add up to 32 weights together before multiplying by the probability, requiring an additional five bits. Instead, we move to using floats during our histogram walks, which further avoids some float -> int conversions because it allows for retaining the floats we're already using to calculate probability. Across the last handful of commits, the increased pessimism more than makes up for the increased runtime complexity, leading to a 40-45% pathfinding speedup on a Xeon Silver 4116 and a 25-45% speedup on a Xeon E5-2687W v3. Thanks to @twood22 for being a sounding board and helping analyze the resulting PDF.

TheBlueMatt · 2024-12-19T20:16:07Z

Squash-pushed a cleanup of the multiply constant, DRYing it up and adding docs.

$ git diff -U2 0aae3e498 adb0afc52
diff --git a/lightning/src/routing/scoring.rs b/lightning/src/routing/scoring.rs
index 94e16b96b..cec241642 100644
--- a/lightning/src/routing/scoring.rs
+++ b/lightning/src/routing/scoring.rs
@@ -1155,4 +1155,16 @@ fn three_f64_pow_9(a: f64, b: f64, c: f64) -> (f64, f64, f64) {
 }

+/// If we have no knowledge of the channel, we scale probability down by a multiple of ~82% for the
+/// historical model by multiplying the denominator of a success probability by this before
+/// dividing by 64.
+///
+/// This number (as well as the PDF) was picked experimentally on probing results to maximize the
+/// log-loss of succeeding and failing hops.
+///
+/// Note that we prefer to increase the denominator rather than decrease the numerator as the
+/// denominator is more likely to be larger and thus provide greater precision. This is mostly an
+/// overoptimization but makes a large difference in tests.
+const MIN_ZERO_IMPLIES_NO_SUCCESSES_PENALTY_ON_64: u64 = 78;
+
 #[inline(always)]
 fn linear_success_probability(
@@ -1165,11 +1177,7 @@ fn linear_success_probability(

        if min_zero_implies_no_successes && min_liquidity_msat == 0 &&
-               denominator < u64::max_value() / 78
+               denominator < u64::max_value() / MIN_ZERO_IMPLIES_NO_SUCCESSES_PENALTY_ON_64
        {
-               // If we have no knowledge of the channel, scale probability down by a multiple of ~82%.
-               // Note that we prefer to increase the denominator rather than decrease the numerator as
-               // the denominator is more likely to be larger and thus provide greater precision. This is
-               // mostly an overoptimization but makes a large difference in tests.
-               denominator = denominator * 78 / 64
+               denominator = denominator * MIN_ZERO_IMPLIES_NO_SUCCESSES_PENALTY_ON_64 / 64
        }

@@ -1206,9 +1214,5 @@ fn nonlinear_success_probability(

        if min_zero_implies_no_successes && min_liquidity_msat == 0 {
-               // If we have no knowledge of the channel, scale probability down by a multiple of ~82%.
-               // Note that we prefer to increase the denominator rather than decrease the numerator as
-               // the denominator is more likely to be larger and thus provide greater precision. This is
-               // mostly an overoptimization but makes a large difference in tests.
-               denominator = denominator * 78.0 / 64.0;
+               denominator = denominator * (MIN_ZERO_IMPLIES_NO_SUCCESSES_PENALTY_ON_64 as f64) / 64.0;
        }

TheBlueMatt added this to the 0.1 milestone Dec 19, 2024

TheBlueMatt linked an issue Dec 19, 2024 that may be closed by this pull request

Rework/increase scoring params #3040

Closed

tankyleo reviewed Dec 19, 2024

View reviewed changes

lightning/src/routing/scoring.rs Outdated Show resolved Hide resolved

tankyleo reviewed Dec 19, 2024

View reviewed changes

lightning/src/routing/scoring.rs Outdated Show resolved Hide resolved

TheBlueMatt force-pushed the 2024-12-optimal-score-params branch 2 times, most recently from 8479eb7 to f852dfa Compare December 19, 2024 03:55

tnull reviewed Dec 19, 2024

View reviewed changes

lightning/src/routing/scoring.rs Outdated Show resolved Hide resolved

tnull reviewed Dec 19, 2024

View reviewed changes

TheBlueMatt force-pushed the 2024-12-optimal-score-params branch 2 times, most recently from c9ae3e8 to 7c8fa41 Compare December 19, 2024 18:53

tnull reviewed Dec 19, 2024

View reviewed changes

TheBlueMatt force-pushed the 2024-12-optimal-score-params branch from 7c8fa41 to 52e2b2e Compare December 19, 2024 19:46

arik-so reviewed Dec 19, 2024

View reviewed changes

TheBlueMatt added 4 commits December 19, 2024 20:00

TheBlueMatt force-pushed the 2024-12-optimal-score-params branch from 52e2b2e to 0aae3e4 Compare December 19, 2024 20:05

TheBlueMatt added 2 commits December 19, 2024 20:15

TheBlueMatt force-pushed the 2024-12-optimal-score-params branch from 0aae3e4 to adb0afc Compare December 19, 2024 20:15

tnull approved these changes Dec 19, 2024

View reviewed changes

arik-so approved these changes Dec 19, 2024

View reviewed changes

TheBlueMatt merged commit 463e432 into lightningdevkit:main Dec 20, 2024
16 of 19 checks passed

Tweak historical scoring model PDF and default penalties #3495

Tweak historical scoring model PDF and default penalties #3495

Uh oh!

Conversation

TheBlueMatt commented Dec 19, 2024

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Dec 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tnull left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TheBlueMatt commented Dec 19, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

TheBlueMatt commented Dec 19, 2024

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Dec 19, 2024 •

edited

Loading