Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SEP-10: Accept challenge transactions through clock drift #3802

Closed
masterhung0112 opened this issue Aug 4, 2021 · 15 comments · Fixed by #3824
Closed

SEP-10: Accept challenge transactions through clock drift #3802

masterhung0112 opened this issue Aug 4, 2021 · 15 comments · Fixed by #3824
Assignees
Labels

Comments

@masterhung0112
Copy link

masterhung0112 commented Aug 4, 2021

What version are you using?

I am using version github.com/stellar/go v0.0.0-20210803084429-90666abbac1d

What did you do?

When I called txnbuild.ReadChallengeTx, I encountered the following error

transaction is not within range of the specified timebounds (currentTime=1628051970, MinTime=1628052011, MaxTime=1628052911)

What did you expect to see?

May you help to apply the same fix to stellar/go?

stellar/js-stellar-sdk@cdeb65c

@JakeUrban
Copy link
Contributor

cc @leighmcculloch @morleyzhi

@JakeUrban
Copy link
Contributor

JakeUrban commented Aug 9, 2021

Hi @masterhung0112, yes I can. Although looking at the change made to the JS SDK, I'm not sure why we're applying the grace period to the MaxTime. The diff between the client and server clocks is only a problem when the client needs to read the transaction returned by the server. The anchor may expect and even require that the SDK raise an error if the currentTime is past MaxTime.

@masterhung0112
Copy link
Author

Hi @JakeUrban, you can fix in any way you think it's best for stellar/go. Thank you very much.
Probably there's no need to apply grace period to the MaxTime.

@morleyzhi
Copy link

@JakeUrban I applied the grace period in both directions of the timeout because the client's clock can be wrong in either direction. And even with the grace period implemented, in production, we get sporadic instances of people still failing the timeout challenges, though we don't track which direction they're wrong.

@JakeUrban
Copy link
Contributor

@morleyzhi I read up on clock drift a bit and it sounds like 5 minutes could easily be too small of a grace period depending on when the client was last synchronized, hardware, temperature, and other factors. I would recommend tracking how often MinTime timeout errors occur so we can get a sense of whether or not 5 minutes is enough.

If a MaxTime timeout error occurs, then the client is within 5 minutes of the true expiration time and would have to refresh it's SEP-10 token soon anyway. Is this how you handle the errors in Vibrant? I'm assuming most anchors set their tokens to be valid for a least an hour (our reference server uses 24 hours), so having to refresh a token 5 minutes early wouldn't meaningfully increase the frequency of having to get a fresh JWT.

I'm going to move forward with only applying the grace period to the MinTime for the Go SDK, but I'll hold off on merging until Leigh gets back in case I'm missing something.

@ire-and-curses
Copy link
Member

I'm uneasy with two of our SDK implementations doing different things in the case of maxTime drift. It seems to me they should be consistent. I guess there are two arguments made here:

  1. The client expects no grace period for max time
  2. Max time grace period in practice won't help much

I'm having trouble seeing the downside to adding the grace period to maxTime. If the downside is significant, we should also fix the JS SDK though.

@leighmcculloch thoughts?

@JakeUrban
Copy link
Contributor

JakeUrban commented Aug 16, 2021

The client expects no grace period for max time

Just to clarify, the server is the one that may expect no grace period. The challenge could be expired by the time it's sent back to the server, but if the grace period is applied to the max time the server may still validate it.

I agree the SDK implementations should be consistent.

@leighmcculloch
Copy link
Member

There is some prior discussion here too: https://github.com/stellar/stellar-protocol/pull/1007/files#r674358006.

Something we discussed there is that it's becoming more common for JWT implementations to not validate the iat value due to clock drift and the value having less value than expiration. It's also not uncommon for JWT libraries to add a grace period to their checks. For example the JWT library we use in Go projects has a 1 minute grace period by default (https://github.com/square/go-jose/blob/v2/jwt/validation.go#L59). The challenge min time is very similar to the iat value in JWTs, so I think the grace period on the min time is being very conservative.

I agree it is unnecessary to add clock drift to the max time if it is big, but on the same note, it isn't material to add it when it is large. If the challenge validity is purposefully small, such as 15 minutes for single message sessions, which I think is what some SEP-30 servers use, then adding clock drift is worthwhile to both. 15 minutes is what we specify in the SEP-10 doc itself as a default expiration.

For connected devices like phones and servers I would think a drift of more than 5 minutes would be rare, but I don't have data. Servers don't need much grace and that's the primary use of the min/max time to scope validity at the server. The client should validate the min/max time to make sure it isn't signing something for the distant future as part of some complex attack, but we could use larger grace periods on clients.

Maybe it is worth making the grace period configurable, and making recommendations to use larger values on client, and smaller on server. We can default to small and make it overridable on clients.

What do you think?

@morleyzhi
Copy link

morleyzhi commented Aug 16, 2021

I just ran the numbers: it looks like, among Vibrant users, something like 0.85% of users experience the transaction timeout error, even with the grace period in both directions, and that is mostly Android users (but the vast majority of our users are Android, so we might not have enough iOS samples).

@ire-and-curses
Copy link
Member

ire-and-curses commented Aug 16, 2021

weird - I would naively expect most Android devices to ntp sync automatically. 5 minutes is a massive drift if you are NTP-syncing.

@leighmcculloch
Copy link
Member

@morleyzhi Are you able to gleam what sort of grace period would help in those cases? It's possible there is no reasonable grace period we could set for those users. For example, if a phone has a wrong timezone set it can be many hours off even if it is displaying a correct HH:MM to the user. It may not be possible to help those cases without simply disabling time validation on clients all together.

@morleyzhi
Copy link

@leighmcculloch I don't think the timezone setting would matter because your UNIX timestamp doesn't care about timezone.

About our whole question: based on our metrics, of the 36 people who experienced a timestamp error, 58% (21 people) went on to eventually succeed.

In diving into a handful of users' event streams, it looks like many of them would fail, only to succeed a few seconds later. So I have no idea what the real story is! There doesn't seem to be a great difference between the event time and the processing time (which I'm assuming is local and Mixpanel server time, respectively). It also varies across SEP-10 servers. So I'm not sure of the full picture here, and unfortunately we need to make a release to add more diagnostics to get a bigger picture.

@leighmcculloch
Copy link
Member

leighmcculloch commented Aug 16, 2021

I don't think the timezone setting would matter because your UNIX timestamp doesn't care about timezone.

It can. How it can happen is the user has the wrong time zone set and manually changes their time to appear accurate. The resulting UTC time is wrong. I doubt this would be a common problem, but a previous product I worked on saw this happen with users after we released something that had a time dependent component.

only to succeed a few seconds later

This would suggest their time differences are very close to the expiry time + grace period, which is odd. Maybe something else is going on here that's unrelated?


@morleyzhi What do you think about making the grace period configurable in the JS SDK, and setting it higher in clients?

@JakeUrban
Copy link
Contributor

I like the idea of parameterizing the grace period, but adding that is a breaking change for languages that don't support defaults for optional parameters, including Go. I'm happy to make the change if thats the route we want to take though.

@leighmcculloch
Copy link
Member

leighmcculloch commented Aug 18, 2021

In the go sdk which is primarily used in server side applications there less value in making it configurable. In sdks used in servers and clients, like the js sdk, there's value. Your #3824 pr is targeting a major version though, so it's fine to change the function signature if we want to go sdk to be configurable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants