Initial passive testing is happy but not next ones started with 10m succession #765

masih · 2024-11-29T12:30:07Z

Critical question: Why is it that the first test in the morning always seem to work nice, and successive tests seem to run not as good?

Looking at the pubsub settings we forked over from Lotus, there are... a lot of questionable decisions that seem to be rooted in pre-F3 filecoin network behaviour (e.g. this).

I wonder if change in passive testing network causes some loss of mesh or unfair peer scoring such that gossip sub mesh becomes ineffective to the point where messages simply do not propagate fast enough. Take invalid message scoring for example, when networks change it is inevitable that some messages arrive rom previous network that would be considered invalid. We also observe spike in invalid message error in validation flow documented here at initial instance.

So...

Could it be that the ineffective gossipsub at least to some extent is the result of change in network during passive testing?
Are there parameters set in pubsub that unfairly reduce ranking or negatively impact the mesh by deeming what passive testing does (change in topic, resubscrption, dropping messages between networks) ?
Could it be that the current pubsub settings even within a single passive testing network impact peer ranking when instances progress e.g. due to high rate of validation ignores?

masih · 2024-11-29T12:43:53Z

And looks like lotus (and by extension Observer, F3, etc.) retains negative scoring for 6 hours. This is a setting set at top level pubsub. I assume it affects the pubsub instance, i.e. all topics in its lifetime.

rjan90 · 2024-11-29T13:07:32Z

Anecdotally I see a lot of PeerIDs with the exact same really high negative score:

lotus net scores
12D3KooWBPyrDyrTRchikR56W21cW3dQ5YRDeAgCZvPjw7jopfuU, -1795600.000000
12D3KooWBNh4V7JeEvYLKvSbGeMMMFJyB3vavEyEipqNYaZh9cNS, -1795600.000000
12D3KooWBNMVxsBq4T5T8qX8E1FWhfyVULDJ56a3mE1m6r3bEJ8f, -1795600.000000
12D3KooWAy4R5DgHcAuP7Z6CJyesQXkNPfoBFShMtdMtg1z3dhWS, -1795600.000000
12D3KooWAmPdJJcrNQ9qL4Dtj229kJ2VngPtrEmz6fd7duc6N8Q4, -1795600.000000
12D3KooWAewsJcXcVoEhCwfvD7zWwCPae8WtVvcL8nvy84HdNivL, -1795600.000000
12D3KooWAY9Vq9wzqRjzaoKheXPDVf9YCf1GpQ32V4mtjtxAaHPW, -1795600.000000
12D3KooWAPsAXsxBpuRJbjiX7cFzNsA8A1UZe8ikWsbgxZ7DDu5Y, -1795600.000000
12D3KooWAEZaEAwxco3Coho2c4KESS5Q868NYhXzSHAXdvwomYAt, -1795600.000000

A total of 139 on my node with the exact same negative score, out of a total of:

lotus net scores | wc -l
2045

Total number of PeerIDs that have negative scores is:

rjan90 · 2024-11-29T20:29:51Z

For clarity I also grepped for the ones that subscribe to F3, and most have 0 scores - with some occasional negative ones, but not the high negative score as ^^

{"ID":"12D3KooW9sCwBYPVGr9T7A5DMzk8qF4wdGtTGSREK7kMLdJDBLR6","Score":{"Score":0,"Topics":{"/f3/granite/0.0.2/filecoin/21":{"TimeInMesh":0,"FirstMessageDeliveries":0,"MeshMessageDeliveries":0,"InvalidMessageDeliveries":0}},"AppSpecificScore":0,"IPColocationFactor":0,"BehaviourPenalty":0}}
{"ID":"12D3KooW9rUCW2eEmbZsGarEBzdh7RwqZXzVhm5yW4GHpM4PxGLV","Score":{"Score":0,"Topics":{"/f3/granite/0.0.2/filecoin/21":{"TimeInMesh":0,"FirstMessageDeliveries":0,"MeshMessageDeliveries":0,"InvalidMessageDeliveries":0}},"AppSpecificScore":0,"IPColocationFactor":0,"BehaviourPenalty":0}}
{"ID":"12D3KooW9qsRsJmXkgYuyJnZNDwpB75Lhs1dw6myiNFDTLgwgbQA","Score":{"Score":0,"Topics":{"/f3/granite/0.0.2/filecoin/21":{"TimeInMesh":0,"FirstMessageDeliveries":0,"MeshMessageDeliveries":0,"InvalidMessageDeli
[14:16](https://filecoinproject.slack.com/archives/C077HAHSP8U/p1732886203372249?thread_ts=1732883402.397439&cid=C077HAHSP8U)

And the ones with extremly high negative scores are IPColocationFactor

{"ID":"12D3KooWBNMVxsBq4T5T8qX8E1FWhfyVULDJ56a3mE1m6r3bEJ8f","Score":{"Score":-1876900,"Topics":null,"AppSpecificScore":0,"IPColocationFactor":18769,"BehaviourPenalty":0}}
{"ID":"12D3KooWAy4R5DgHcAuP7Z6CJyesQXkNPfoBFShMtdMtg1z3dhWS","Score":{"Score":-1876900,"Topics":null,"AppSpecificScore":0,"IPColocationFactor":18769,"BehaviourPenalty":0}}
{"ID":"12D3KooWAmPdJJcrNQ9qL4Dtj229kJ2VngPtrEmz6fd7duc6N8Q4","Score":{"Score":-1876900,"Topics":null,"AppSpecificScore":0,"IPColocationFactor":18769,"BehaviourPenalty":0}}
{"ID":"12D3KooWAewsJcXcVoEhCwfvD7zWwCPae8WtVvcL8nvy84HdNivL","Score":{"Score":-1876900,"Topics":null,"AppSpecificScore":0,"IPColocationFactor":18769,"BehaviourPenalty":0}}
{"ID":"12D3KooWAY9Vq9wzqRjzaoKheXPDVf9YCf1GpQ32V4mtjtxAaHPW","Score":{"Score":-1876900,"Topics":null,"AppSpecificScore":0,"IPColocationFactor":18769,"BehaviourPenalty":0}}
{"ID":"12D3KooWAPsAXsxBpuRJbjiX7cFzNsA8A1UZe8ikWsbgxZ7DDu5Y","Score":{"Score":-1876900,"Topics":null,"AppSpecificScore":0,"IPColocationFactor":18769,"BehaviourPenalty":0}}
{"ID":"12D3KooWAEZaEAwxco3Coho2c4KESS5Q868NYhXzSHAXdvwomYAt","Score":{"Score":-1876900,"Topics":null,"AppSpecificScore":0,"IPColocationFactor":18769,"BehaviourPenalty":0}}
{"ID":"12D3KooWA4kybwSTq57KfMxJ4unVPbFTXTxRpe1S5HcKALRvu2FY","Score":{"Score":-1876900,"Topics":null,"AppSpecificScore":0,"IPColocationFactor":18769,"BehaviourPenalty":0}

Another test after a prolonged pause should be ran to rule out peer scares, but it does not seem that peerIDs get negatively scored.

masih · 2024-12-03T11:45:49Z

I have not found sufficient evidence to believe this is a genuine issue:

Increased time between successive passive testing to 30m.
Retried the day after with no over-night tests.
Evidence is inconclusive: I cannot reproducibly get the network to misbehave in connection with time between successive passive testing.

Closing.

github-project-automation bot added this to F3 Nov 29, 2024

github-project-automation bot moved this to Todo in F3 Nov 29, 2024

masih self-assigned this Dec 3, 2024

masih added this to the Milestone 2.5: Mainnet Deployment Readiness milestone Dec 3, 2024

masih closed this as completed Dec 3, 2024

github-project-automation bot moved this from Todo to Done in F3 Dec 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial passive testing is happy but not next ones started with 10m succession #765

Initial passive testing is happy but not next ones started with 10m succession #765

masih commented Nov 29, 2024

masih commented Nov 29, 2024 •

edited

Loading

rjan90 commented Nov 29, 2024

rjan90 commented Nov 29, 2024

masih commented Dec 3, 2024

Initial passive testing is happy but not next ones started with 10m succession #765

Initial passive testing is happy but not next ones started with 10m succession #765

Comments

masih commented Nov 29, 2024

masih commented Nov 29, 2024 • edited Loading

rjan90 commented Nov 29, 2024

rjan90 commented Nov 29, 2024

masih commented Dec 3, 2024

masih commented Nov 29, 2024 •

edited

Loading