Skip to content

Tweak historical scoring model PDF and default penalties #3495

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

TheBlueMatt
Copy link
Collaborator

Utilizing the results of probes sent once a minute to a random node
in the network for a random amount (within a reasonable range), we
were able to analyze the accuracy of our resulting success
probability estimation with various PDFs across the historical and
live-bounds models.

For each candidate PDF (as well as other parameters, including the
histogram bucket weight), we used the
`min_zero_implies_no_successes` fudge factor in
`success_probability` as well as a total probability multiple fudge
factor to get both the historical success model and the a priori
model to be neither too optimistic nor too pessimistic (as measured
by the relative log-loss between succeeding and failing hops in our
sample data).

We then compared the resulting log-loss for the historical success
model and selected the candidate PDF with the lowest log-loss,
skipping a few candidates with similar resulting log-loss but with
more extreme constants (such as a power of 11 with a higher
`min_zero_implies_no_successes` penalty).

@TheBlueMatt TheBlueMatt added this to the 0.1 milestone Dec 19, 2024
@TheBlueMatt TheBlueMatt linked an issue Dec 19, 2024 that may be closed by this pull request
@TheBlueMatt TheBlueMatt force-pushed the 2024-12-optimal-score-params branch 2 times, most recently from 8479eb7 to f852dfa Compare December 19, 2024 03:55
Copy link

codecov bot commented Dec 19, 2024

Codecov Report

Attention: Patch coverage is 97.72727% with 4 lines in your changes missing coverage. Please review.

Project coverage is 88.29%. Comparing base (42cc4e7) to head (7c8fa41).
Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
lightning/src/routing/scoring.rs 97.72% 1 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3495      +/-   ##
==========================================
- Coverage   88.33%   88.29%   -0.04%     
==========================================
  Files         149      149              
  Lines      112839   113000     +161     
  Branches   112839   113000     +161     
==========================================
+ Hits        99672    99771      +99     
- Misses      10688    10737      +49     
- Partials     2479     2492      +13     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

}

#[inline(always)]
/// Identical to [`success_probability_float`] but returns integer numerator and denominators.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mhh, may make sense to use a macro to DRY this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me just clean it up by splitting out the calculations into their own functions we can call.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -1766,7 +1805,7 @@ mod bucketed_history {
// Because the first thing we do is check if `total_valid_points` is sufficient to consider
// the data here at all, and can return early if it is not, we want this to go first to
// avoid hitting a second cache line load entirely in that case.
total_valid_points_tracked: u64,
total_valid_points_tracked: f64,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, isn't this just a count, why do we need an f64 here? If it's not a count anymore, should we rename the variable?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a count, but we ultimately use it as an f64 (dividing our probability-weights by it) so we do the int -> float conversion in the caching step instead of in the score calculation step.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a comment in any case.

@TheBlueMatt TheBlueMatt force-pushed the 2024-12-optimal-score-params branch 2 times, most recently from c9ae3e8 to 7c8fa41 Compare December 19, 2024 18:53
Copy link
Contributor

@tnull tnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to squash the fixups. I of course didn't run any simulations so can't validate the new magic numbers here.

But, LGTM, I think.

for (max_idx, max_bucket) in max_liquidity_offset_history_buckets.iter().enumerate() {
if *max_bucket >= BUCKET_FIXED_POINT_ONE {
highest_max_bucket_with_full_points = Some(cmp::max(highest_max_bucket_with_full_points.unwrap_or(0), max_idx));
}
if *max_bucket != 0 {
highest_max_bucket_with_points = cmp::max(highest_max_bucket_with_points, max_idx);
}
total_max_points += *max_bucket as u64;
total_weight += (*max_bucket as u64) * (*max_bucket as u64)
* (min_liquidity_offset_history_buckets[0] as u64) * (min_liquidity_offset_history_buckets[0] as u64);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we multiplying the 0 min-bucket here, and aren't just quadrupling the max_bucket as elsewhere?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm? Wdo we do max_bucket^4? We should always be multiplying a min bucket by a max bucket and then squaring (though I'll update this to avoid the 3rd multiply). Here we're always using the 0th min bucket as because this loop is only doing the 0th min bucket and iterating the max-buckets.

@TheBlueMatt TheBlueMatt force-pushed the 2024-12-optimal-score-params branch from 7c8fa41 to 52e2b2e Compare December 19, 2024 19:46
@TheBlueMatt
Copy link
Collaborator Author

Squashed with one less multiply

 git diff-tree -U1 7c8fa41e9 52e2b2ea9
diff --git a/lightning/src/routing/scoring.rs b/lightning/src/routing/scoring.rs
index deff7feb3..94e16b96b 100644
--- a/lightning/src/routing/scoring.rs
+++ b/lightning/src/routing/scoring.rs
@@ -2006,4 +2006,4 @@ mod bucketed_history {
 					// `recalculate_valid_point_count`.
-					total_weight += (*max_bucket as u64) * (*max_bucket as u64)
-						* (min_liquidity_offset_history_buckets[0] as u64) * (min_liquidity_offset_history_buckets[0] as u64);
+					let bucket_weight = (*max_bucket as u64) * (min_liquidity_offset_history_buckets[0] as u64);
+					total_weight += bucket_weight * bucket_weight;
 				}

@@ -1180,13 +1185,13 @@ fn success_probability(
};

if min_zero_implies_no_successes && min_liquidity_msat == 0 &&
denominator < u64::max_value() / 21
denominator < u64::max_value() / 78
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's been a while, can you add context around how you got to this threshold value?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

experimentally determined :)

Utilizing the results of probes sent once a minute to a random node
in the network for a random amount (within a reasonable range), we
were able to analyze the accuracy of our resulting success
probability estimation with various PDFs.

For each candidate PDF (as well as other parameters, to be tuned in
the coming commits), we used the `min_zero_implies_no_successes`
fudge factor in `success_probability` as well as a total
probability multiple fudge factor to get both the historical
success model and the a priori model to be neither too optimistic
nor too pessimistic (as measured by the relative log-loss between
succeeding and failing hops in our sample data).

We then compared the resulting log-loss for the historical success
model and selected the candidate PDF with the lowest log-loss,
skipping a few candidates with similar resulting log-loss but with
more extreme constants (such as a power of 11 with a higher
`min_zero_implies_no_successes` penalty).

This resulted in a PDF of `128 * (1/256 + 9*(x - 0.5)^8)` with a
`min_zero_implies_no_successes` probability multiplier of 64/78.

Thanks to @twood22 for being a sounding board and helping analyze
the resulting PDF.
Utilizing the results of probes sent once a minute to a random node
in the network for a random amount (within a reasonable range), we
were able to analyze the accuracy of our resulting success
probability estimation with various PDFs across the historical and
live-bounds models.

For each candidate PDF (as well as other parameters, to be tuned in
the coming commits), we used the `min_zero_implies_no_successes`
fudge factor in `success_probability` as well as a total
probability multiple fudge factor to get both the historical
success model and the a priori model to be neither too optimistic
nor too pessimistic (as measured by the relative log-loss between
succeeding and failing hops in our sample data).

We then compared the resulting log-loss for the historical success
model and selected the candidate PDF with the lowest log-loss,
skipping a few candidates with similar resulting log-loss but with
more extreme constants (such as a power of 11 with a higher
`min_zero_implies_no_successes` penalty).

In every case, the historical model performed substantially better
than the live-bounds model, so here we simply disable the
live-bounds model by default and use only the historical model.
Further, we use the calculated total probability multiple fudge
factor (0.7886892844179266) to choose the ratio between the
historical model and the per-hop penalty (as multiplying each hop's
probability by 78% is equivalent to adding a per-hop penalty of
log10(0.78) of our probabilistic penalty).

We take this opportunity to bump the penalties up a bit as well, as
anecdotally LDK users are willing to pay more than they do today to
get more successful paths.

Fixes lightningdevkit#3040
If the liquidity penalty multipliers in the scoring config are both
0 (as is now the default), the corresponding liquiditiy penalties
will be 0. Thus, we should avoid doing the work to calculate them
if we're ultimately just gonna get a value of zero anyway, which we
do here.
Utilizing the results of probes sent once a minute to a random node
in the network for a random amount (within a reasonable range), we
were able to analyze the accuracy of our resulting success
probability estimation with various PDFs across the historical and
live-bounds models.

For each candidate PDF (as well as other parameters, to be tuned in
the coming commits), we used the `min_zero_implies_no_successes`
fudge factor in `success_probability` as well as a total
probability multiple fudge factor to get both the historical
success model and the a priori model to be neither too optimistic
nor too pessimistic (as measured by the relative log-loss between
succeeding and failing hops in our sample data).

Across the simulation runs, for a given PDF and other parameters,
we nearly always did better with a shorter half-life (even as short
as 1ms, i.e. only learning per-probe rather than across probes).
While this likely makes sense for nodes which do live probing, not
all nodes do, and thus we should avoid over-biasing on the dataset
we have.

While it may make sense to only learn per-payment and not across
payments, I can't fully rationalize this result and thus want to
avoid over-tuning, so here we reduce the half-life from 6 hours to
30 minutes.
@TheBlueMatt TheBlueMatt force-pushed the 2024-12-optimal-score-params branch from 52e2b2e to 0aae3e4 Compare December 19, 2024 20:05
In the next commit we'll want to return floats or ints from
`success_probability` depending on the callsite, so instead of
duplicating the calculation logic, here we split the linear (which
always uses int math) and nonlinear (which always uses float math)
into separate methods, allowing us to write trivial
`success_probability` wrappers that return the desired type.
Utilizing the results of probes sent once a minute to a random node
in the network for a random amount (within a reasonable range), we
were able to analyze the accuracy of our resulting success
probability estimation with various PDFs across the historical and
live-bounds models.

For each candidate PDF (as well as other parameters, including the
histogram bucket weight), we used the
`min_zero_implies_no_successes` fudge factor in
`success_probability` as well as a total probability multiple fudge
factor to get both the historical success model and the a priori
model to be neither too optimistic nor too pessimistic (as measured
by the relative log-loss between succeeding and failing hops in our
sample data).

We then compared the resulting log-loss for the historical success
model and selected the candidate PDF with the lowest log-loss,
skipping a few candidates with similar resulting log-loss but with
more extreme constants (such as a power of 11 with a higher
`min_zero_implies_no_successes` penalty).

Somewhat surprisingly (to me at least), the (fairly strongly)
preferred model was one where the bucket weights in the historical
histograms are exponentiated. In the current design, the weights
are effectively squared as we multiply the minimum- and maximum-
histogram buckets together before adding the weight*probabilities
together.

Here we multiply the weights yet again before addition. While the
simulation runs seemed to prefer a slightly stronger weight than
the 4th power we do here, the difference wasn't substantial
(log-loss 0.5058 to 0.4941), so we do the simpler single extra
multiply here.

Note that if we did this naively we'd run out of bits in our
arithmetic operations - we have 16-bit buckets, which when raised
to the 4th can fully fill a 64-bit int. Additionally, when looking
at the 0th min-bucket we occasionally add up to 32 weights together
before multiplying by the probability, requiring an additional five
bits.

Instead, we move to using floats during our histogram walks, which
further avoids some float -> int conversions because it allows for
retaining the floats we're already using to calculate probability.

Across the last handful of commits, the increased pessimism more
than makes up for the increased runtime complexity, leading to a
40-45% pathfinding speedup on a Xeon Silver 4116 and a 25-45%
speedup on a Xeon E5-2687W v3.

Thanks to @twood22 for being a sounding board and helping analyze
the resulting PDF.
@TheBlueMatt TheBlueMatt force-pushed the 2024-12-optimal-score-params branch from 0aae3e4 to adb0afc Compare December 19, 2024 20:15
@TheBlueMatt
Copy link
Collaborator Author

Squash-pushed a cleanup of the multiply constant, DRYing it up and adding docs.

$ git diff -U2 0aae3e498 adb0afc52
diff --git a/lightning/src/routing/scoring.rs b/lightning/src/routing/scoring.rs
index 94e16b96b..cec241642 100644
--- a/lightning/src/routing/scoring.rs
+++ b/lightning/src/routing/scoring.rs
@@ -1155,4 +1155,16 @@ fn three_f64_pow_9(a: f64, b: f64, c: f64) -> (f64, f64, f64) {
 }

+/// If we have no knowledge of the channel, we scale probability down by a multiple of ~82% for the
+/// historical model by multiplying the denominator of a success probability by this before
+/// dividing by 64.
+///
+/// This number (as well as the PDF) was picked experimentally on probing results to maximize the
+/// log-loss of succeeding and failing hops.
+///
+/// Note that we prefer to increase the denominator rather than decrease the numerator as the
+/// denominator is more likely to be larger and thus provide greater precision. This is mostly an
+/// overoptimization but makes a large difference in tests.
+const MIN_ZERO_IMPLIES_NO_SUCCESSES_PENALTY_ON_64: u64 = 78;
+
 #[inline(always)]
 fn linear_success_probability(
@@ -1165,11 +1177,7 @@ fn linear_success_probability(

        if min_zero_implies_no_successes && min_liquidity_msat == 0 &&
-               denominator < u64::max_value() / 78
+               denominator < u64::max_value() / MIN_ZERO_IMPLIES_NO_SUCCESSES_PENALTY_ON_64
        {
-               // If we have no knowledge of the channel, scale probability down by a multiple of ~82%.
-               // Note that we prefer to increase the denominator rather than decrease the numerator as
-               // the denominator is more likely to be larger and thus provide greater precision. This is
-               // mostly an overoptimization but makes a large difference in tests.
-               denominator = denominator * 78 / 64
+               denominator = denominator * MIN_ZERO_IMPLIES_NO_SUCCESSES_PENALTY_ON_64 / 64
        }

@@ -1206,9 +1214,5 @@ fn nonlinear_success_probability(

        if min_zero_implies_no_successes && min_liquidity_msat == 0 {
-               // If we have no knowledge of the channel, scale probability down by a multiple of ~82%.
-               // Note that we prefer to increase the denominator rather than decrease the numerator as
-               // the denominator is more likely to be larger and thus provide greater precision. This is
-               // mostly an overoptimization but makes a large difference in tests.
-               denominator = denominator * 78.0 / 64.0;
+               denominator = denominator * (MIN_ZERO_IMPLIES_NO_SUCCESSES_PENALTY_ON_64 as f64) / 64.0;
        }

@TheBlueMatt TheBlueMatt merged commit 463e432 into lightningdevkit:main Dec 20, 2024
16 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rework/increase scoring params
4 participants