Added diff states (no binary serialization, no static typing) - includes fix for the first-state problem. #278

TheYellowArchitect · 2024-09-14T13:53:14Z

All states are sent UNRELIABLE_ORDERED regardless if they are a full state (contains all properties) or a diff state (contains only the properties with new values)
Once the client receives any state, he sends a RELIABLE rpc to the server that he has indeed received a full state, and so it can start receiving diff states. Once this RELIABLE rpc is received by the server, all future states sent to that specific peer are diff states, instead of full states. Diff states are states whose property/values included are only those which have changed from the previous tick.

This implementation also solves the diff states problem mentioned in #264 (this is the 2nd way, and I think its the cleanest and most performant)

I have tested it and found no issues.

Solves #157
As for the bandwidth it saves, see forest-brawl example: "Displaceable:mass", "Displaceable:impulse", ":speed" which change only once via powerup, so 3/5 states are skipped with diff states. Assuming the variables were of same size, with diff states, there is 60% less bandwidth.

~~Requires #277 which is just static typing additions (so this PR is easy to read, instead of having double the size)~~

…piler hits error if function doesnt exist, so typos are caught with static RPC functions)

addons/netfox/properties/property-entry.gd

TheYellowArchitect · 2024-09-14T14:01:26Z

addons/netfox/rollback/network-rollback.gd

+ get:
+ return ProjectSettings.get_setting("netfox/rollback/enable_state_diffs", true)
+ set(v):
+ enable_state_diffs = v


Is the setter needless? I am not familiar with Godot's setters/getters.
Also are project settings' values cached on runtime, or does it search project settings' values each time I get()?

Is there any reason for this flag to be writable during runtime? I'd just go with a push_error message.

ProjectSettings.get_setting boils down to a Node.get call under the hood, so I'd say it's fine as it is. But if you want to be proactive about perf, feel free to introduce a variable with an underscore prefix that's initialized in _ready and use that.

I made it push_error as you suggested. As for the second part of this, I opened #304

addons/netfox/rollback/rollback-synchronizer.gd

…RED) regardless of full or diff. The server always starts by sending full states. Once the client receives any (full) state, he sends an ack to the server to start sending diffs.

olivatooo · 2024-10-02T20:21:46Z

I really hope the author reads this

elementbound · 2024-10-09T09:58:47Z

@olivatooo getting there 🙂

elementbound · 2024-10-10T20:26:36Z

Cleaned up the PR a bit so we can focus on the actual changes. In the future, please keep in mind the following for PRs:

One PR should implement a single feature
The PR shouldn't contain changes not related to the feature it implements
Preferably don't start rewriting unrelated parts of the codebase along different coding conventions, instead start a discussion. In this PR specifically, I mean adding static typing annotations to variables inside function scopes.
Even though Godot modifies it, you can just not commit the Brawler scene 🙂

Thanks!

TheYellowArchitect · 2024-10-11T08:04:30Z

Even though Godot modifies it, you can just not commit the Brawler scene

I apologize for this slipping through, it pops up in each PR I make by default, I gotta manually exclude this, and it seems in this PR I missed it.
Btw, which godot version do you use to edit netfox? I use 4.1.0

The PR shouldn't contain changes not related to the feature it implements
One PR should implement a single feature
commit-message: "remove input diff fragment"

This PR still contains the input changes which are unrelated to diff states. I will move them to a different PR. Freely close that input refactor PR.

Btw on the removal of static typing, I gotta say its harder to read this PR without types, because you cant tell if a property variable is PropertyEntry or String

TheYellowArchitect · 2024-10-11T08:07:41Z

addons/netfox/properties/property-entry.gd

@@ -28,6 +27,5 @@ static func parse(root: Node, path: String) -> PropertyEntry:
 var result = PropertyEntry.new()
 result.node = root.get_node(NodePath(path))
 result.property = path.erase(0, path.find(":") + 1)
- result.type = typeof(result.get_value())


Should I make a PR for caching the type of the property entry?

Not yet, that would need a separate discussion on why that would be beneficial.

TheYellowArchitect · 2024-10-11T08:09:04Z

addons/netfox/properties/property-snapshot.gd

 for property in properties:
 result[property.to_string()] = property.get_value()
 result.make_read_only()
 return result

 static func apply(properties: Dictionary, cache: PropertyCache):
 for property in properties:
- var pe: PropertyEntry = cache.get_entry(property)
+ var pe = cache.get_entry(property)


If there is no static typing here, at least a rename is required of pe to property_entry

Sure, but either in a different PR, or later in this PR's lifecycle

TheYellowArchitect · 2024-10-11T08:10:46Z

addons/netfox/rollback/rollback-synchronizer.gd

- var earliest: int = buffer.keys().min()
- var latest: int = buffer.keys().max()
+ var earliest = buffer.keys().min()
+ var latest = buffer.keys().max()


Static typing in places such as this help greatly in readability. Without the int (hinting to tick), if you land suddenly on this codeblock, you dont know what buffer.keys() can be, could be variant in the reader's head

As discussed earlier, convention is to not use static typing inside methods. Renaming variables to be more expressive of what they do is a good idea though.

The latter can be done either in a separate PR intended for readability, or later in this PR's lifecycle.

Did them in seperate PRs: #306 and #305

TheYellowArchitect · 2024-10-12T11:08:08Z

I removed the input refactor so this PR is exclusively on state diffs. I found a weird bug though on Godot 4.1 (haven't tested other versions)

It sends state ticks in order (e.g. 45, 46, 47) but sometimes they arrive out of order (e.g. 45, 47, 46) which shouldn't happen because the RPC is "unreliable_ordered". I want to test this with other godot versions because until Godot 4.4 dev3 some RPC bugs are fixed, and this may include this.

Nonetheless, this made me realize that even if Godot doesn't bug, packet loss and packets being shuffled are a possibility, so it's good to cover those edge scenarios. So I made a condition block for this scenario (see the for picked_tick in range(_latest_state_tick, tick))

The above works and has no bugs but there is still that rare jittering bug where you run 0 ping but there is a jittering at some player once in 10 boots or something (which existed before this PR), which I suspect is caused by this "unreliable_ordered" failure. I will open an issue on this once I confirm its root and can reproduce it consistently.

Anyway, I left the debug comments and stuff inside, turning this into a draft for now until I remove the debug comments and look deeper into this bug (e.g. test other godot versions)
Despite it being in draft, it works as intended btw, so test freely.

I will open a PR for the input refactor later, gotta do tasks irl will take a few days.

…oss.

TheYellowArchitect · 2024-10-13T09:35:08Z

Yup, checkedout to master, opened Godot 4.3, this bug existed prior to this PR and will continue existing with this PR.
Sends tick 93, 95, 94.
Or see the following:

So I pause this PR to investigate this bug #300

elementbound · 2024-10-15T19:59:30Z

addons/netfox/rollback/rollback-synchronizer.gd

@@ -148,27 +151,63 @@ func _process_tick(tick: int):

 func _record_tick(tick: int):
 # Broadcast state we own
- if not _auth_state_props.is_empty():
- var broadcast = {}
+ if not _auth_state_props.is_empty(): #This is always the server, as the server owns all avatars


Comment is misleading, this is not a requirement. Though uncommon, other peers can own state too, and this is determined per node. So I'd prefer not to specify this, even as comment, in netfox core.

elementbound · 2024-10-15T20:05:16Z

addons/netfox/rollback/rollback-synchronizer.gd

+ var diff_state_to_broadcast: Dictionary = {}
+ for picked_property_path in full_state_to_broadcast:
+ #If previous tick doesnt have a property, this means its a new property added in runtime, so we must add it.
+ if (_states[tick - 1].has(picked_property_path) == false):


Please create a dedicated variable to the previous state and add a comment that it will always have a value due to the check on line 169.

elementbound · 2024-10-15T20:07:13Z

addons/netfox/rollback/rollback-synchronizer.gd

+ diff_state_to_broadcast[picked_property] = full_state_to_broadcast[picked_property]
+
+ for picked_peer_id in multiplayer.get_peers():
+ if (NetworkTime._synced_clients.has(picked_peer_id) == false):


What's the motivation behind this check?

NetworkTime.tick is not synced yet for the client, so sending states can cause issues because before the sync for example the client can be tick 54 then jump to tick 60 and then start incrementing the tick by 1, so there is a skip in states

For a practical example, in the game im using netfox for, when a client joins, it syncs to the exact NetworkTime.tick of the server

This check can be removed, it shouldn't cause a bug because to un-synced clients, diff states are never sent, only full states, but I would need to test first before removing this from draft

Ah right, and if the state jumps from 54 to 60, we can't do diff states because we don't have tick 59, right? If so, I'd refer to my other comment on the approach. Ideally we shouldn't need a contiguous state history.

elementbound · 2024-10-15T20:08:56Z

addons/netfox/rollback/rollback-synchronizer.gd

 var _earliest_input: int

+var _sent_full_state_to_peer_ids: Array[int] = []


If you plan to use this as a set of peers, I'd recommend a dictionary instead of an array.

elementbound · 2024-10-15T20:09:21Z

addons/netfox/rollback/rollback-synchronizer.gd

+ _logger.warning("At tick %s skipped peer id %s" % [tick, picked_peer_id])
+ continue
+
+ if (_sent_full_state_to_peer_ids.count(picked_peer_id) > 1):


When would this trigger?

This was a remnant from debugging, I will remove it 👍

elementbound · 2024-10-15T20:50:35Z

addons/netfox/rollback/rollback-synchronizer.gd

+ properties_to_include.append(picked_property_path)
+
+ #Set property values which are different from previous state sent
+ for picked_property in properties_to_include:


Why have this loop separately? Why not just add things to the diff state as we go in the previous loop?

Good catch. It was a remnant from the olde PackedByteArray serialization, where I first got the properties to include then serialized them in a different loop.

I will add that line to the above/identical loop 👍

elementbound · 2024-10-15T20:51:21Z

addons/netfox/rollback/rollback-synchronizer.gd

+ else:
+ var properties_to_include: Array[String] = []
+ var diff_state_to_broadcast: Dictionary = {}
+ for picked_property_path in full_state_to_broadcast:


I think it would be best to create a PropertySnapshot.diff(a, b) method instead, and call that here.

Is this still true for the new commit, aka this code?

elementbound · 2024-10-15T20:53:56Z

addons/netfox/rollback/rollback-synchronizer.gd

+ if (sanitized.has(picked_property_path) == false):
+ if (_states.has(tick - 1)):
+ if (_states[tick-1].has(picked_property_path)):
+ sanitized[picked_property_path] = _states[tick - 1][picked_property_path]


Why not just PropertySnapshot.merge the incoming state and the previous tick's state?

I will look into that function, I didn't know it existed

elementbound · 2024-10-15T20:55:21Z

addons/netfox/rollback/rollback-synchronizer.gd

+ _logger.error("Received a state without any properties from %s, but diff states which should make it possible, are disabled!" % multiplayer.get_remote_sender_id())
+ breakpoint
+
+func set_received_state(received_state: Dictionary, tick: int, sender_id: int):


I get that a piece of logic was extracted here, but I don't think set_received_state is the best name. Let's get back to this once the rest of the comments are addressed.

I agree. Let's find a proper name once I push a commit, and have addressed the comments 👍

elementbound · 2024-10-15T21:06:34Z

addons/netfox/rollback/rollback-synchronizer.gd

+ set_received_state(received_state, tick, multiplayer.get_remote_sender_id())
+ elif (NetworkRollback.enable_state_diffs):
+ if (_states.has(tick - 1)):
+ _states[tick] = _states[tick - 1]


Why do this? At this point, _states[tick] should already be set by set_received_state, which this then overrides fully.

Because if a diff contains 0 properties (that means there was no change in the properties from previous tick), the new state must simply be a duplicate of the previous one.

elementbound · 2024-10-15T21:17:05Z

Ok so, I have a few big concerns regarding this implementation. The way I understand it:

Host starts sending full states
Client receives full state, ack's
Host starts sending diff states compared to previous state

Let me know if I missed something.

The third point is very problematic imo - client ack's the first full state, but nothing guarantees they'll have full info for the rest of the states. Diffs might get lost, as they're sent unreliably. So once a diff is lost, the whole chain breaks.

I think diff states should include a reference tick number, i.e. saying "this state describes the diff from this given reference tick". This way, clients will only need to have full info for the reference tick and none other. This was discussed under the original issue.

I'd also prefer if full states weren't sent once at the beginning, but at regular intervals, to make sure that even if something goes wrong with the diffs, we still get the occasional full state to course-correct. Similar to how video codecs have keyframes.

Lastly, I think it would benefit the code if either the full-state and diff-state code flows were more visibly separated ( e.g. different RPC's to handle them ), or collapsed into a single flow, with as little branching as possible. I feel the current version has way too many checks to comfortably manage the two flows in one.

TheYellowArchitect · 2024-10-16T10:06:47Z

The way I understand it:

Host starts sending full states

Client receives full state, ack's

Host starts sending diff states compared to previous state

Correct overview 👍

The third point is very problematic imo - client ack's the first full state, but nothing guarantees they'll have full info for the rest of the states. Diffs might get lost, as they're sent unreliably. So once a diff is lost, the whole chain breaks.

It is true, but it should correct itself once the diff changes again. For example, for position and rotation since they change so often, losing a tick doesn't matter because it will be updated in another diff tick. But to play devil's advocate and support your argument, if the different property is movement_speed in forest-brawl, that happens only once via powerup and expires in like 10 seconds, so that is 10 seconds of a property mismatch.

I think diff states should include a reference tick number, i.e. saying "this state describes the diff from this given reference tick". This way, clients will only need to have full info for the reference tick and none other. This was discussed under the #157 (comment).

The issue above, from what I understand, says that a diff should reference a tick with full state, but there are no full states here once diff states begin.

I'd also prefer if full states weren't sent once at the beginning, but at regular intervals, to make sure that even if something goes wrong with the diffs, we still get the occasional full state to course-correct. Similar to how video codecs have keyframes.

This is one of the 2 solutions to this problem. The other solution being akin to input_redundancy where diff'd properties are sent for the following ticks when they change, so if a property changes at tick 52, it sends the delta diff as expected, but at tick 53 it again sends its property value that it has at tick 53, and the same happens for tick 54. However at tick 55+ it doesn't send the property at all if it hasn't changed. So for the above example of tick 52,53,54, its 3 ticks of let's call it state_redundancy :P

The first solution is a hybrid of full state and diff state, and the second solution is doubling down on the diff state. There are pros and cons for both, for example, the second solution is ideal if you have 200+ dropped items in a game world whose position doesn't change, so they simply don't send their position (or state) and they consume no bandwidth.

For its implementation, I would add a var counter: int either to PropertyEntry or in an Array[int] which detects how many times it is sent after it has changed. So once it changes, it adds 1 per _record_tick and it does an if check with state_redundancy and if its equal, it doesn't send it again. And ofc it resets to 0 once the property changes value.

Which solution would you like? I can code both, and either solves the problem.

Lastly, I think it would benefit the code if either the full-state and diff-state code flows were more visibly separated ( e.g. different RPC's to handle them ), or collapsed into a single flow, with as little branching as possible. I feel the current version has way too many checks to comfortably manage the two flows in one.

I will push a commit today or tomorrow, give me a confirmation if you want me to split in 2 RPCs. Personally I don't mind.

Thank you for the detailed code review. I honestly didn't expect it, and I feel kinda guilty that I have paused the commit. Its just 2 codeblocks I have to debug/confirm they work as intended and I have prioritized other tasks irl, I didn't expect such rigorous review.

I will push the above commit since its just 2 codeblocks (ignore it freely) which smoothens the packet loss, then I will do a followup commit which addresses all the code review comments.

TheYellowArchitect · 2024-10-16T11:38:41Z

I pushed, packet loss is fixed/addressed. It is good enough to remove from draft, though I keep the debug comments until everything is determined to be perfect. I addressed most of the coding review comments in these commits.

I will test this with the latest master merges and probably push a commit tomorrow because of that forloop which duplicates states, see that comment.
It works, but it may cause jittering so I will test it extensively later to confirm.

@elementbound I request to confirm with the current code, whether to split the _submit_state RPC to _submit_full_state and _submit_diff_state, and also which of the 2 solutions to apply for the packet loss on a diff property (as mentioned in the above comment)

After the above confirmations, I will implement them on this PR.

TheYellowArchitect · 2024-10-16T11:45:51Z

addons/netfox/rollback/rollback-synchronizer.gd

+ # Also fixes packet loss (e.g. a state missing)
+ for picked_tick in range(_latest_state_tick, received_tick):
+ #Check for the very first latest_state_tick which doesn't have a corresponding state
+ if (_states.has(picked_tick)):


This if check is required because I found a bug with _latest_state_tick which exists prior to this PR as well, even with the starting order fix of #284 and I will open an issue for that, but until then this if check saves the day (consider it a hack until that issue is resolved)

To synopsize the issue: _latest_state_tick is initialized inside process_settings() or ready(), but wherever, its value is always NetworkTime.tick - 1. I changed this to NetworkTime.tick as it makes more sense. But even this value is untrue because the latest state is set only if we are the authority (after _record_tick) or after _submit_state if we are not the authority. So it should have the value -1 instead of NetworkTime.tick and let this variable tick value be set after its actual state is set.

By setting the value to -1 as it should be, causes some other problems, so yeah I opted for this if check 😅

…e review comments

TheYellowArchitect and others added 8 commits August 22, 2024 15:03

Changed RPC strings to static functions (faster performance, also com…

753f44b

…piler hits error if function doesnt exist, so typos are caught with static RPC functions)

Restore string projectile ids

7a85290

further restore for string ids

bd49333

removed get player id

3fbe20a

Restored brawler

4ff2351

version bump

bed38a5

Added static typing.

caa9ac0

Merged with static typing

dc2dd72

TheYellowArchitect mentioned this pull request Sep 14, 2024

Add diff state compression #264

Closed

TheYellowArchitect commented Sep 14, 2024

View reviewed changes

addons/netfox/properties/property-entry.gd Outdated Show resolved Hide resolved

TheYellowArchitect commented Sep 14, 2024

View reviewed changes

addons/netfox/rollback/rollback-synchronizer.gd Outdated Show resolved Hide resolved

TheYellowArchitect added 2 commits September 14, 2024 19:46

Added diff states in the 2nd way (all states are sent UNRELIABLE ORDE…

a824e71

…RED) regardless of full or diff. The server always starts by sending full states. Once the client receives any (full) state, he sends an ack to the server to start sending diffs.

Merge branch 'main' into diff-states-no-serialization

a5cf403

Merge branch 'main' into diff-states-no-serialization

b83b1e5

elementbound added 2 commits October 10, 2024 22:22

revert typing changes

c602b3e

fxs

676ff45

remove input diff fragment

95a3509

TheYellowArchitect commented Oct 11, 2024

View reviewed changes

TheYellowArchitect marked this pull request as draft October 12, 2024 11:09

Removed input refactor and added fix for shuffled packet and packet l…

82d06d3

…oss.

TheYellowArchitect mentioned this pull request Oct 13, 2024

Some State Packets are not received #300

Closed

elementbound reviewed Oct 15, 2024

View reviewed changes

TheYellowArchitect marked this pull request as ready for review October 16, 2024 11:34

TheYellowArchitect commented Oct 16, 2024

View reviewed changes

TheYellowArchitect added 3 commits October 16, 2024 16:56

Fixed a bug with packet loss, and added feedback from some of the cod…

6b74ef6

…e review comments

fix for packet loss states

f6b1705

restore inputs

8f37419

		var _earliest_input: int

		var _sent_full_state_to_peer_ids: Array[int] = []

Added diff states (no binary serialization, no static typing) - includes fix for the first-state problem. #278

Are you sure you want to change the base?

Added diff states (no binary serialization, no static typing) - includes fix for the first-state problem. #278

Conversation

TheYellowArchitect commented Sep 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TheYellowArchitect Oct 16, 2024 • edited Loading

Choose a reason for hiding this comment

olivatooo commented Oct 2, 2024

elementbound commented Oct 9, 2024

elementbound commented Oct 10, 2024 • edited Loading

TheYellowArchitect commented Oct 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TheYellowArchitect Oct 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TheYellowArchitect commented Oct 12, 2024 • edited Loading

TheYellowArchitect commented Oct 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TheYellowArchitect Oct 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TheYellowArchitect Oct 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elementbound commented Oct 15, 2024

TheYellowArchitect commented Oct 16, 2024 • edited Loading

TheYellowArchitect commented Oct 16, 2024 • edited Loading

TheYellowArchitect Oct 16, 2024 • edited Loading

Choose a reason for hiding this comment

TheYellowArchitect commented Sep 14, 2024 •

edited

Loading

TheYellowArchitect Oct 16, 2024 •

edited

Loading

elementbound commented Oct 10, 2024 •

edited

Loading

TheYellowArchitect commented Oct 11, 2024 •

edited

Loading

TheYellowArchitect Oct 11, 2024 •

edited

Loading

TheYellowArchitect commented Oct 12, 2024 •

edited

Loading

TheYellowArchitect commented Oct 13, 2024 •

edited

Loading

TheYellowArchitect Oct 16, 2024 •

edited

Loading

TheYellowArchitect Oct 16, 2024 •

edited

Loading

TheYellowArchitect commented Oct 16, 2024 •

edited

Loading

TheYellowArchitect commented Oct 16, 2024 •

edited

Loading

TheYellowArchitect Oct 16, 2024 •

edited

Loading