Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove unhelpful logging about events without age params. #3861

Closed
wants to merge 1 commit into from

Conversation

ara4n
Copy link
Member

@ara4n ara4n commented Nov 7, 2023

the conditional was unnecessary, the logline wasn't wrapped, and it creates literally tens of thousands of warn-level logspam on my account


This change is marked as an internal change (Task), so will not be included in the changelog.

the conditional was unnecessary, the logline wasn't wrapped,
and it creates literally tens of thousands of warn-level logspam
on my account
@ara4n ara4n requested a review from a team as a code owner November 7, 2023 23:29
@ara4n ara4n added the T-Task Tasks for the team like planning label Nov 7, 2023
Copy link
Member

@richvdh richvdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am led to wonder why you are seeing events without an age in the first place... It seems like the right fix here might be to change that rather than to keep calling a method that, to quote its documentation, "should never be used".

The presence of the conditional is odd though.

@ara4n
Copy link
Member Author

ara4n commented Nov 8, 2023

i assume they are just old events?

Copy link
Member

@richvdh richvdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i assume they are just old events?

For obvious reasons, synapse injects age at the point the event is served to you over the CSAPI. So being old wouldn't make much difference.

Having dug a little deeper, it looks like this codepath is being taken for every single typing notification. (Obligatory mention of matrix-org/matrix-spec#897). And the new fallback and logging was introduced in #3854.

Whatever else we decide to do here, we need to fix the fact that we are calling a function that "should never be used" for every single typing notification (and probably presence too). Ripping out logging that was added only two days ago is just plastering over the real problem here.

/cc @toger5

@richvdh
Copy link
Member

richvdh commented Nov 9, 2023

a function that "should never be used" for every single typing notification (and probably presence too)

oh, also read receipts

@toger5
Copy link
Contributor

toger5 commented Nov 9, 2023

The intention for adding this was that I observed numerous events without age (especially state events where we relied on the age)

I was hoping to confirm that this was just a rare issue I ran into because of some reason. So I indeed expected this log to almost never be called.

What is the rational behind the age property? Is it acceptable to not be included in events? In what conditions is that acceptable.

The fact, that this code path is always called makes sense I guess.
For every event we want to know this.localTimestamp. Hence we call this.getAge() ?? this.getFallbackAge() so whenever getAge is undefined this is called.

I expected age == undefined to be undesired behaviour.

@toger5
Copy link
Contributor

toger5 commented Nov 9, 2023

In case it is per spec to not include the age for specific events, this (just removing the warining) indeed seems to be the right fix. The codepath still makes sense since it updates the localTimestamp to be now - origin_server_ts instaed of now which is the better approximation.

For context this is what was changed:
https://github.com/matrix-org/matrix-js-sdk/pull/3854/files#diff-abae49c10d0eab16ff7e75b4c4b02c67c22d2e97cd23610589086395b7800242L394-L396

from:

this.localTimestamp = Date.now() - (this.getAge() ?? 0);

to

this.localTimestamp = Date.now() - (this.getAge() ?? this.fallbackAge());

so the overhead should not be severe. instead of doing one call to now() and one to this.getAge() we do two calls to now() one call to getAge() and one to getTs(). So (except for the printing i dont see that this is a significant performance issue considering all the computations that are happening when an event is received.

Copy link
Contributor

@toger5 toger5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If not sending an age for events is acceptable and does not imply other issues removing the warning makes sense.
For call events the age is somewhat important. So another solution would be to change the condition to. this.AGE_DEPENENT_EVENTS.includes(this.eventType())

and make AGE_DEPENDANT_EVENTS a list of event types where a missing age implies potential client issues.

(m.call.member would be one of those events)

@richvdh
Copy link
Member

richvdh commented Nov 9, 2023

What is the rational behind the age property? Is it acceptable to not be included in events? In what conditions is that acceptable.

As I said: typing, presence, and read-receipts do not have an age property. It is debatable whether any of those count as "events" but for whatever reason, the MatrixEvent constructor is being called for those objects.

I really don't think we should merge this as is. Fundamentally the fact that we need to rip out this logging means that we are expecting to call code that is currently documented as "should never be used". That is extremely misleading to anyone else attempting to read the code in future.

At the very least, we should update the documentation on the function to better describe the circumstances under which it is used. However, now that I look even closer, I have more questions about the original PR:

  • Typing, presence and read-receipts also lack an origin_server_ts, so this is going to set localTimestamp to NaN for such events, which seems significantly worse than the current Date.now().
  • Although the spec makes age an optional property, in practice I would strongly expect it to be present for all "real" events served over the C-S API by Synapse. If you're seeing something different, I think we should probably find out why that is.
  • Maybe less important, but tbh I find the structure of the code a bit odd - in particular, why do we have two copies of Date.now() - (this.getAge() ?? this.fallbackAge()) rather than a single function that returns the right answer?

It's also worth noting, that even where age is present, it is known to be fundamentally broken. To be honest I have never really understood why age is more reliable than just using origin_server_ts in the first place, but the fix is pretty easy, so if you have a useful usecase for a working age I would recommend letting the backend team know about it!

@toger5
Copy link
Contributor

toger5 commented Nov 9, 2023

Typing, presence and read-receipts also lack an origin_server_ts, so this is going to set localTimestamp to NaN for such events, which seems significantly worse than the current Date.now().

That sounds like it would be a good filter to solve the fallback.

private fallbackAge(): number {
        if (!this.getTs()) {
            // In case there is no origin_server_ts there is no way to do a fallback if getAge() is returning undefined 
            return 0;
        }
        if (!this.getAge()) {
            logger.warn(
                "`unsigned.age` for an event that has an `origing_server_ts` not available. `now - origin_server_ts` is used as a fallback. If the device clock is not correct issues might occur.",
            );
        }
        return Math.max(Date.now() - this.getTs(), 0);
    }

Although the spec makes age an optional property, in practice I would strongly expect it to be present for all "real" events served over the C-S API by Synapse. If you're seeing something different, I think we should probably find out why that is.

This sounds like it is of interest to have warnings in the console in case there is an issue. With the condition that we only even try all this if there is a origin_server_ts the logging should also not be all over the place anymore if I undestand you correctly.

Maybe less important, but tbh I find the structure of the code a bit odd - in particular, why do we have two copies of Date.now() - (this.getAge() ?? this.fallbackAge()) rather than a single function that returns the right answer?

I see. The reason I was implementing it this way is to not change this.getAge() I was not sure if there is code where we check on/rely on it beeing undefined. So I wanted to explicitly add the fallback only where we used this.getAge() ?? 0.

I agree that:

Date.now() - (this.getAge() ?? this.fallbackAge())
// if getAge == undefined this equals
Date.now() - Math.max(Date.now() - this.getTs(), 0);
// which in most cases is
Date.now() - Date.now() + this.getTs()

is kinda stupid...

It though it makes it more readable however. The alternative would be to have a fallback localTimestamp() that would just be getTs(). But then we would have this code:

this.localTimestamp = this.getAge() !== undefined ? Date.now() - this.getAge() : (this.getTs() !== undefined ? this.getTs() : 0);

// this could be made more readable however:
const localTimeStampFromAge = this.getAge() !== undefined ? Date.now() - this.getAge() : undefined;
const localTimeStampFromOriginServerTs = this.getTs();
this.localTimestamp = localTimeStampFromAge ?? localTimeStampFromOriginServerTs ?? Date.now();

@richvdh
Copy link
Member

richvdh commented Nov 9, 2023

closing this PR in favour of #3870. Let's debate a correct implementation of the age fallback elsewhere.

@richvdh richvdh closed this Nov 9, 2023
@ara4n
Copy link
Member Author

ara4n commented Nov 10, 2023

To be honest I have never really understood why age is more reliable than just using origin_server_ts in the first place,

I perpetrated age in the first place. The reason for it (in 2014 or whever) was to have a measurement of event staleness which didn't rely on the handset's clock being correct (as otherwise it won't be able to compare origin_server_ts sensibly), and the use case was to decide when to ignore stale 1:1 VoIP signalling events, to try to stop spuriously apps ringing when they do an initial/incr sync and see old VoIP events.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-Task Tasks for the team like planning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants