Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added diff states (no binary serialization, no static typing) - includes fix for the first-state problem. #278

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

TheYellowArchitect
Copy link
Contributor

@TheYellowArchitect TheYellowArchitect commented Sep 14, 2024

All states are sent UNRELIABLE_ORDERED regardless if they are a full state (contains all properties) or a diff state (contains only the properties with new values)
Once the client receives any state, he sends a RELIABLE rpc to the server that he has indeed received a full state, and so it can start receiving diff states. Once this RELIABLE rpc is received by the server, all future states sent to that specific peer are diff states, instead of full states. Diff states are states whose property/values included are only those which have changed from the previous tick.

This implementation also solves the diff states problem mentioned in #264 (this is the 2nd way, and I think its the cleanest and most performant)

I have tested it and found no issues.

Solves #157
As for the bandwidth it saves, see forest-brawl example: "Displaceable:mass", "Displaceable:impulse", ":speed" which change only once via powerup, so 3/5 states are skipped with diff states. Assuming the variables were of same size, with diff states, there is 60% less bandwidth.

Requires #277 which is just static typing additions (so this PR is easy to read, instead of having double the size)

get:
return ProjectSettings.get_setting("netfox/rollback/enable_state_diffs", true)
set(v):
enable_state_diffs = v
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the setter needless? I am not familiar with Godot's setters/getters.
Also are project settings' values cached on runtime, or does it search project settings' values each time I get()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason for this flag to be writable during runtime? I'd just go with a push_error message.

ProjectSettings.get_setting boils down to a Node.get call under the hood, so I'd say it's fine as it is. But if you want to be proactive about perf, feel free to introduce a variable with an underscore prefix that's initialized in _ready and use that.

Copy link
Contributor Author

@TheYellowArchitect TheYellowArchitect Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made it push_error as you suggested. As for the second part of this, I opened #304

…RED) regardless of full or diff. The server always starts by sending full states. Once the client receives any (full) state, he sends an ack to the server to start sending diffs.
@olivatooo
Copy link

I really hope the author reads this

@elementbound
Copy link
Contributor

@olivatooo getting there 🙂

@elementbound
Copy link
Contributor

elementbound commented Oct 10, 2024

Cleaned up the PR a bit so we can focus on the actual changes. In the future, please keep in mind the following for PRs:

  • One PR should implement a single feature
  • The PR shouldn't contain changes not related to the feature it implements
  • Preferably don't start rewriting unrelated parts of the codebase along different coding conventions, instead start a discussion. In this PR specifically, I mean adding static typing annotations to variables inside function scopes.
  • Even though Godot modifies it, you can just not commit the Brawler scene 🙂

Thanks!

@TheYellowArchitect
Copy link
Contributor Author

TheYellowArchitect commented Oct 11, 2024

Even though Godot modifies it, you can just not commit the Brawler scene

I apologize for this slipping through, it pops up in each PR I make by default, I gotta manually exclude this, and it seems in this PR I missed it.
Btw, which godot version do you use to edit netfox? I use 4.1.0

The PR shouldn't contain changes not related to the feature it implements
One PR should implement a single feature
commit-message: "remove input diff fragment"

This PR still contains the input changes which are unrelated to diff states. I will move them to a different PR. Freely close that input refactor PR.


Btw on the removal of static typing, I gotta say its harder to read this PR without types, because you cant tell if a property variable is PropertyEntry or String

@@ -28,6 +27,5 @@ static func parse(root: Node, path: String) -> PropertyEntry:
var result = PropertyEntry.new()
result.node = root.get_node(NodePath(path))
result.property = path.erase(0, path.find(":") + 1)
result.type = typeof(result.get_value())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I make a PR for caching the type of the property entry?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet, that would need a separate discussion on why that would be beneficial.

for property in properties:
result[property.to_string()] = property.get_value()
result.make_read_only()
return result

static func apply(properties: Dictionary, cache: PropertyCache):
for property in properties:
var pe: PropertyEntry = cache.get_entry(property)
var pe = cache.get_entry(property)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is no static typing here, at least a rename is required of pe to property_entry

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but either in a different PR, or later in this PR's lifecycle

var earliest: int = buffer.keys().min()
var latest: int = buffer.keys().max()
var earliest = buffer.keys().min()
var latest = buffer.keys().max()
Copy link
Contributor Author

@TheYellowArchitect TheYellowArchitect Oct 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Static typing in places such as this help greatly in readability. Without the int (hinting to tick), if you land suddenly on this codeblock, you dont know what buffer.keys() can be, could be variant in the reader's head

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed earlier, convention is to not use static typing inside methods. Renaming variables to be more expressive of what they do is a good idea though.

The latter can be done either in a separate PR intended for readability, or later in this PR's lifecycle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did them in seperate PRs: #306 and #305

@TheYellowArchitect
Copy link
Contributor Author

TheYellowArchitect commented Oct 12, 2024

I removed the input refactor so this PR is exclusively on state diffs. I found a weird bug though on Godot 4.1 (haven't tested other versions)

It sends state ticks in order (e.g. 45, 46, 47) but sometimes they arrive out of order (e.g. 45, 47, 46) which shouldn't happen because the RPC is "unreliable_ordered". I want to test this with other godot versions because until Godot 4.4 dev3 some RPC bugs are fixed, and this may include this.

Nonetheless, this made me realize that even if Godot doesn't bug, packet loss and packets being shuffled are a possibility, so it's good to cover those edge scenarios. So I made a condition block for this scenario (see the for picked_tick in range(_latest_state_tick, tick))

The above works and has no bugs but there is still that rare jittering bug where you run 0 ping but there is a jittering at some player once in 10 boots or something (which existed before this PR), which I suspect is caused by this "unreliable_ordered" failure. I will open an issue on this once I confirm its root and can reproduce it consistently.

Anyway, I left the debug comments and stuff inside, turning this into a draft for now until I remove the debug comments and look deeper into this bug (e.g. test other godot versions)
Despite it being in draft, it works as intended btw, so test freely.

I will open a PR for the input refactor later, gotta do tasks irl will take a few days.

@TheYellowArchitect TheYellowArchitect marked this pull request as draft October 12, 2024 11:09
@TheYellowArchitect
Copy link
Contributor Author

TheYellowArchitect commented Oct 13, 2024

netfox-send-state-tick-shuffle
Yup, checkedout to master, opened Godot 4.3, this bug existed prior to this PR and will continue existing with this PR.
Sends tick 93, 95, 94.
Or see the following:
netfox-non-linear-state-receiving

So I pause this PR to investigate this bug #300

@@ -148,27 +151,63 @@ func _process_tick(tick: int):

func _record_tick(tick: int):
# Broadcast state we own
if not _auth_state_props.is_empty():
var broadcast = {}
if not _auth_state_props.is_empty(): #This is always the server, as the server owns all avatars
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment is misleading, this is not a requirement. Though uncommon, other peers can own state too, and this is determined per node. So I'd prefer not to specify this, even as comment, in netfox core.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

var diff_state_to_broadcast: Dictionary = {}
for picked_property_path in full_state_to_broadcast:
#If previous tick doesnt have a property, this means its a new property added in runtime, so we must add it.
if (_states[tick - 1].has(picked_property_path) == false):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please create a dedicated variable to the previous state and add a comment that it will always have a value due to the check on line 169.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

diff_state_to_broadcast[picked_property] = full_state_to_broadcast[picked_property]

for picked_peer_id in multiplayer.get_peers():
if (NetworkTime._synced_clients.has(picked_peer_id) == false):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the motivation behind this check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NetworkTime.tick is not synced yet for the client, so sending states can cause issues because before the sync for example the client can be tick 54 then jump to tick 60 and then start incrementing the tick by 1, so there is a skip in states

For a practical example, in the game im using netfox for, when a client joins, it syncs to the exact NetworkTime.tick of the server

This check can be removed, it shouldn't cause a bug because to un-synced clients, diff states are never sent, only full states, but I would need to test first before removing this from draft

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, and if the state jumps from 54 to 60, we can't do diff states because we don't have tick 59, right? If so, I'd refer to my other comment on the approach. Ideally we shouldn't need a contiguous state history.

var _earliest_input: int

var _sent_full_state_to_peer_ids: Array[int] = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you plan to use this as a set of peers, I'd recommend a dictionary instead of an array.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

_logger.warning("At tick %s skipped peer id %s" % [tick, picked_peer_id])
continue

if (_sent_full_state_to_peer_ids.count(picked_peer_id) > 1):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would this trigger?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a remnant from debugging, I will remove it 👍

Copy link
Contributor Author

@TheYellowArchitect TheYellowArchitect Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

properties_to_include.append(picked_property_path)

#Set property values which are different from previous state sent
for picked_property in properties_to_include:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have this loop separately? Why not just add things to the diff state as we go in the previous loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. It was a remnant from the olde PackedByteArray serialization, where I first got the properties to include then serialized them in a different loop.

I will add that line to the above/identical loop 👍

else:
var properties_to_include: Array[String] = []
var diff_state_to_broadcast: Dictionary = {}
for picked_property_path in full_state_to_broadcast:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be best to create a PropertySnapshot.diff(a, b) method instead, and call that here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still true for the new commit, aka this code?

if (sanitized.has(picked_property_path) == false):
if (_states.has(tick - 1)):
if (_states[tick-1].has(picked_property_path)):
sanitized[picked_property_path] = _states[tick - 1][picked_property_path]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just PropertySnapshot.merge the incoming state and the previous tick's state?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will look into that function, I didn't know it existed

_logger.error("Received a state without any properties from %s, but diff states which should make it possible, are disabled!" % multiplayer.get_remote_sender_id())
breakpoint

func set_received_state(received_state: Dictionary, tick: int, sender_id: int):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get that a piece of logic was extracted here, but I don't think set_received_state is the best name. Let's get back to this once the rest of the comments are addressed.

Copy link
Contributor Author

@TheYellowArchitect TheYellowArchitect Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Let's find a proper name once I push a commit, and have addressed the comments 👍

set_received_state(received_state, tick, multiplayer.get_remote_sender_id())
elif (NetworkRollback.enable_state_diffs):
if (_states.has(tick - 1)):
_states[tick] = _states[tick - 1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do this? At this point, _states[tick] should already be set by set_received_state, which this then overrides fully.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because if a diff contains 0 properties (that means there was no change in the properties from previous tick), the new state must simply be a duplicate of the previous one.

@elementbound
Copy link
Contributor

Ok so, I have a few big concerns regarding this implementation. The way I understand it:

  1. Host starts sending full states
  2. Client receives full state, ack's
  3. Host starts sending diff states compared to previous state

Let me know if I missed something.

The third point is very problematic imo - client ack's the first full state, but nothing guarantees they'll have full info for the rest of the states. Diffs might get lost, as they're sent unreliably. So once a diff is lost, the whole chain breaks.

I think diff states should include a reference tick number, i.e. saying "this state describes the diff from this given reference tick". This way, clients will only need to have full info for the reference tick and none other. This was discussed under the original issue.

I'd also prefer if full states weren't sent once at the beginning, but at regular intervals, to make sure that even if something goes wrong with the diffs, we still get the occasional full state to course-correct. Similar to how video codecs have keyframes.

Lastly, I think it would benefit the code if either the full-state and diff-state code flows were more visibly separated ( e.g. different RPC's to handle them ), or collapsed into a single flow, with as little branching as possible. I feel the current version has way too many checks to comfortably manage the two flows in one.

@TheYellowArchitect
Copy link
Contributor Author

TheYellowArchitect commented Oct 16, 2024

The way I understand it:

  1. Host starts sending full states
  2. Client receives full state, ack's
  3. Host starts sending diff states compared to previous state

Correct overview 👍

The third point is very problematic imo - client ack's the first full state, but nothing guarantees they'll have full info for the rest of the states. Diffs might get lost, as they're sent unreliably. So once a diff is lost, the whole chain breaks.

It is true, but it should correct itself once the diff changes again. For example, for position and rotation since they change so often, losing a tick doesn't matter because it will be updated in another diff tick. But to play devil's advocate and support your argument, if the different property is movement_speed in forest-brawl, that happens only once via powerup and expires in like 10 seconds, so that is 10 seconds of a property mismatch.

I think diff states should include a reference tick number, i.e. saying "this state describes the diff from this given reference tick". This way, clients will only need to have full info for the reference tick and none other. This was discussed under the #157 (comment).

The issue above, from what I understand, says that a diff should reference a tick with full state, but there are no full states here once diff states begin.

I'd also prefer if full states weren't sent once at the beginning, but at regular intervals, to make sure that even if something goes wrong with the diffs, we still get the occasional full state to course-correct. Similar to how video codecs have keyframes.

This is one of the 2 solutions to this problem. The other solution being akin to input_redundancy where diff'd properties are sent for the following ticks when they change, so if a property changes at tick 52, it sends the delta diff as expected, but at tick 53 it again sends its property value that it has at tick 53, and the same happens for tick 54. However at tick 55+ it doesn't send the property at all if it hasn't changed. So for the above example of tick 52,53,54, its 3 ticks of let's call it state_redundancy :P

The first solution is a hybrid of full state and diff state, and the second solution is doubling down on the diff state. There are pros and cons for both, for example, the second solution is ideal if you have 200+ dropped items in a game world whose position doesn't change, so they simply don't send their position (or state) and they consume no bandwidth.

For its implementation, I would add a var counter: int either to PropertyEntry or in an Array[int] which detects how many times it is sent after it has changed. So once it changes, it adds 1 per _record_tick and it does an if check with state_redundancy and if its equal, it doesn't send it again. And ofc it resets to 0 once the property changes value.

Which solution would you like? I can code both, and either solves the problem.

Lastly, I think it would benefit the code if either the full-state and diff-state code flows were more visibly separated ( e.g. different RPC's to handle them ), or collapsed into a single flow, with as little branching as possible. I feel the current version has way too many checks to comfortably manage the two flows in one.

I will push a commit today or tomorrow, give me a confirmation if you want me to split in 2 RPCs. Personally I don't mind.


Thank you for the detailed code review. I honestly didn't expect it, and I feel kinda guilty that I have paused the commit. Its just 2 codeblocks I have to debug/confirm they work as intended and I have prioritized other tasks irl, I didn't expect such rigorous review.

I will push the above commit since its just 2 codeblocks (ignore it freely) which smoothens the packet loss, then I will do a followup commit which addresses all the code review comments.

@TheYellowArchitect TheYellowArchitect marked this pull request as ready for review October 16, 2024 11:34
@TheYellowArchitect
Copy link
Contributor Author

TheYellowArchitect commented Oct 16, 2024

I pushed, packet loss is fixed/addressed. It is good enough to remove from draft, though I keep the debug comments until everything is determined to be perfect. I addressed most of the coding review comments in these commits.

I will test this with the latest master merges and probably push a commit tomorrow because of that forloop which duplicates states, see that comment.
It works, but it may cause jittering so I will test it extensively later to confirm.


@elementbound I request to confirm with the current code, whether to split the _submit_state RPC to _submit_full_state and _submit_diff_state, and also which of the 2 solutions to apply for the packet loss on a diff property (as mentioned in the above comment)

After the above confirmations, I will implement them on this PR.

# Also fixes packet loss (e.g. a state missing)
for picked_tick in range(_latest_state_tick, received_tick):
#Check for the very first latest_state_tick which doesn't have a corresponding state
if (_states.has(picked_tick)):
Copy link
Contributor Author

@TheYellowArchitect TheYellowArchitect Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if check is required because I found a bug with _latest_state_tick which exists prior to this PR as well, even with the starting order fix of #284 and I will open an issue for that, but until then this if check saves the day (consider it a hack until that issue is resolved)

To synopsize the issue: _latest_state_tick is initialized inside process_settings() or ready(), but wherever, its value is always NetworkTime.tick - 1. I changed this to NetworkTime.tick as it makes more sense. But even this value is untrue because the latest state is set only if we are the authority (after _record_tick) or after _submit_state if we are not the authority. So it should have the value -1 instead of NetworkTime.tick and let this variable tick value be set after its actual state is set.

By setting the value to -1 as it should be, causes some other problems, so yeah I opted for this if check 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants