Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Message list loads forever when some messages have reactions #4156

Closed
pohutukawa opened this issue Jun 15, 2020 · 22 comments
Closed

Message list loads forever when some messages have reactions #4156

pohutukawa opened this issue Jun 15, 2020 · 22 comments

Comments

@pohutukawa
Copy link

pohutukawa commented Jun 15, 2020

Unless visiting a stream with only very few (up to about a hand-full) messages, the mobile UI never loads the topic/message list. It remains in the stylised "message list loading" type of screen (fancy spinner). To me, this seems possibly related to #4033, but it may be different (more severe?).

In our environment pretty much all users experience this on the mobile Zulip client on both Android as well as iOS. The web and desktop clients all work fully as expected, with a very responsive UI.

Side note: After an issue with the RabbitMQ queue handling on the server (issue zulip/zulip#14862) at the time we have resolved to upgrading the server to the current Git master (beginning of May).

@gnprice
Copy link
Member

gnprice commented Jun 16, 2020

Thanks @pohutukawa for the report! That is a pretty bad experience.

One thing reported on #4033 was this workaround:

Restarting the app (force quit and reload) fixes it immediately 100% of the time

Does that fix it for you? That's still a bad experience -- maybe less terrible but still bad -- but it's potentially an informative clue for us in debugging it.

@gnprice gnprice changed the title most streams never load in mobile UI (Android and iOS) Message list loads forever, except when very few messages Jun 16, 2020
@pohutukawa
Copy link
Author

Thanks @gnprice for taking this on. Yes, I have tried the force quit & reload, but it didn't work.

I've also deleted all the cache & data of the app and re-connected to our Zulip server: Same thing, nothing loads.

If I can help with gathering insights, I'm happy to provide them to my means of ability.

@pohutukawa
Copy link
Author

@gnprice any further input on this? Anything we can do to provide further data to debug the problem?

@chrisbobbe
Copy link
Contributor

chrisbobbe commented Jun 24, 2020

The loading view you're seeing is this one, right; gray squares and lines covering most of the screen?

image

Or is it a circular spinner, or something else?

@pohutukawa
Copy link
Author

@chrisbobbe Yes, it's the "stripey one" in the screen shot above.

@pohutukawa
Copy link
Author

Ping.
Any help I can provide for further insights on debugging. For quite a while the mobile Zulip client is completely unusable, now. People are starting to scream to go back to Slack again, which I'd like to avoid.

@gnprice
Copy link
Member

gnprice commented Jul 1, 2020

@pohutukawa Thanks for your persistence with this!

I think in order to get helpful logs for debugging this, we'll need to add some new code. I hope to have a beta version out soon with some code for that. If you haven't already, one step you can take now is to join the beta:
https://github.com/zulip/zulip-mobile/#using-the-beta
(Right now that has the same version as we've released for everyone, but it'll get each new version sooner.)

A few other ideas that might turn up an interesting clue:

  • Would you make an account on our community server chat.zulip.org, and try connecting to that from the mobile app?
    • That'll help confirm whether the relevant variable that's triggering this bug is something about your Zulip server, vs. something about your network or some other variable common to you and your colleagues.
  • Can you think of anything unusual about either the configuration of your Zulip server, or the way you've been using it? Perfectly fine if not; but if you can, that could potentially give a lead on a way to reproduce the issue.

@chrisbobbe
Copy link
Contributor

Oh, here's another idea: Do the affected streams all seem to have large files (e.g., images) in them?

chrisbobbe added a commit to chrisbobbe/zulip-mobile that referenced this issue Jul 1, 2020
In steady state, these changes should be reverted or structured a
bit better; it feels a bit haphazard.

If the placeholders are still visible after 10 seconds, send a log
to Sentry with all the events the WebView has received, with
timestamps of receipt. Redact any `auth` fields by replacing the
value with "redacted", and redact any `content` fields by extracting
just the opening tag for the "message-loading" div so we can see if
it has the "hidden" class.

But we want to debug zulip#4156 ASAP,

[1]: https://chat.zulip.org/#narrow/stream/243-mobile-team/topic/.23M4156.20Message.20List.20placeholders/near/921700
@chrisbobbe
Copy link
Contributor

chrisbobbe commented Jul 3, 2020

Here's an issue we haven't thought about in a while, but that's still open and a symptom is the gray bars showing indefinitely: #3281

IIUC, the theory there is that we fail silently and aren't retrying at all, and this particularly affects people with weak Internet connections by causing this symptom. Sounds very plausible, at least on the surface.

@pohutukawa Is there anything consistent among you and your colleagues about your Internet connections, e.g., everyone's always on cellular data? (We should definitely fix this for users at all Internet speeds, but it might be a helpful debugging clue.)

@pohutukawa
Copy link
Author

@gnprice:

A few other ideas that might turn up an interesting clue:

* Would you make an account on our community server chat.zulip.org, and try connecting to that from the mobile app?
  
  * That'll help confirm whether the relevant variable that's triggering this bug is something about your Zulip server, vs. something about your network or some other variable common to you and your colleagues.

OK, I have just tried to connect to chat.zulip.org using my mobile client, and everything seems to work fine. Thus I'd expect that it might be an issue with an assumption on something the mobile client does from (our version of) the server vs. what the web/desktop clients do.

* Can you think of anything unusual about either the configuration of your Zulip server, or the way you've been using it? Perfectly fine if not; but if you can, that could potentially give a lead on a way to reproduce the issue.

If you look at my original entry in here, I did at one point upgrade our server to Git master due to a problem with the RabbitMQ queues getting the server repeatedly overloaded and stuck. So we're not quite in sync with published releases, even though on desktop/web clients everything works as expected.

@chrisbobbe:
We're here on a very reliable work WiFi network during working hours. And the problem persists no matter what network (mobile, company WiFi, home WiFi) we're on. And that's accross all Android and iOS users.

@pohutukawa
Copy link
Author

Oh, here's another idea: Do the affected streams all seem to have large files (e.g., images) in them?

Likely. Screen shots are very commonly used (unfortunately even where a text copy/paste would suffice). So I'd imagine the answer to your question is "yes likely".

gnprice pushed a commit to chrisbobbe/zulip-mobile that referenced this issue Jul 6, 2020
In steady state, these changes should be reverted or structured a
bit better; it feels a bit haphazard.

If the placeholders are still visible after 10 seconds, send a log
to Sentry with all the events the WebView has received, with
timestamps of receipt. Redact any `auth` fields by replacing the
value with "redacted", and redact any `content` fields by extracting
just the opening tag for the "message-loading" div so we can see if
it has the "hidden" class.

But we want to debug zulip#4156 ASAP,

[1]: https://chat.zulip.org/#narrow/stream/243-mobile-team/topic/.23M4156.20Message.20List.20placeholders/near/921700
gnprice pushed a commit to chrisbobbe/zulip-mobile that referenced this issue Jul 6, 2020
In steady state, these changes should be reverted or structured a
bit better; it feels a bit haphazard.

If the placeholders are still visible after 10 seconds, send a log
to Sentry with all the events the WebView has received, with
timestamps of receipt. Redact any `auth` fields by replacing the
value with "redacted", and redact any `content` fields by extracting
just the opening tag for the "message-loading" div so we can see if
it has the "hidden" class.

But we want to debug zulip#4156 ASAP,

[1]: https://chat.zulip.org/#narrow/stream/243-mobile-team/topic/.23M4156.20Message.20List.20placeholders/near/921700
@gnprice
Copy link
Member

gnprice commented Jul 7, 2020

OK, I have just tried to connect to chat.zulip.org using my mobile client, and everything seems to work fine. Thus I'd expect that it might be an issue with an assumption on something the mobile client does from (our version of) the server vs. what the web/desktop clients do.

[...] And the problem persists no matter what network (mobile, company WiFi, home WiFi) we're on. And that's accross all Android and iOS users.

@pohutukawa Thanks. I have the same inference from that as you do -- it seems like the variable that triggers this bug is clearly something about your server, or perhaps about the data on it.

Meanwhile, today I sent a new release to beta which adds some logging that should help us debug this: version 26.30.153. Please try out the beta:
https://github.com/zulip/zulip-mobile/#using-the-beta
and check that you get the new version, and then play around some to exercise this issue. (On Android you should get the new version immediately; on iOS there's always some lag, but if all goes well with Apple it should be available in the next 12-24 hours.)

@pohutukawa
Copy link
Author

@gnprice Thanks for that. Just joined the beta and installed the new version (Android, 26.30.153). I have force killed and started the mobile Zulip client to make sure the new version is running. So far the behaviour is still identical (getting the "stripey" wait UI while loading without a change).

If you'll let me know on how to access such logs to track down things further, I'd be glad to exercise this.

Cheers for the assistance!

@sentry-io
Copy link

sentry-io bot commented Jul 8, 2020

Sentry issue: ZULIP-MOBILE-3SR

@sentry-io
Copy link

sentry-io bot commented Jul 8, 2020

Sentry issue: ZULIP-MOBILE-3SQ

@pohutukawa
Copy link
Author

What is this sentry stuff up there? I don't seem to have any credentials to do anything with the links.

@chrisbobbe
Copy link
Contributor

chrisbobbe commented Jul 8, 2020

We use Sentry for logging issues encountered in the app. Greg and I have access, and we've been debugging here. 🙂

One thing we've noticed is that there's something odd in the shape of the data representing reactions to messages, in the response your server gives to GET /messages.

Do you know the Git commit ID of the server that's been running? There was a bug in handling that data, which was introduced in zulip/zulip@2a4c62a (committed May 19), and fixed in zulip/zulip@a53daa6 (committed May 21).

You did say, at #4156 (comment), that you upgraded the server in the beginning of May, though, hmm. So you may never have been affected by this particular bug.

Are you seeing weird things happening with reactions in the web app? The discussion that exposed the bug, on May 21st, was here.

@gnprice
Copy link
Member

gnprice commented Jul 9, 2020

@pohutukawa We've done a bit more debugging, and I'm now pretty confident that that server bug is the one you're running into here.

I recommend you upgrade the server to a current version. There's now a release candidate for Zulip Server 3.0, which perhaps has a somewhat reduced risk of having a really obnoxious bug like the one you unfortunately ran into with the version from Git you're currently using.

Thanks again for your persistence in reporting and helping debug this issue. I think there are some things we can change that will make the effects of an issue like this one less painful in the future, and we plan to make those changes.

@pohutukawa
Copy link
Author

I've just followed your curl command mentioned in the other issue thread, and this is the version I'm currently on:

"zulip_version": "2.2-dev-2329-g0af2f9d838"

I'll upgrade to the 3.0 RC next week and see how it goes.

Cheers!

@gnprice
Copy link
Member

gnprice commented Jul 9, 2020

"zulip_version": "2.2-dev-2329-g0af2f9d838"

OK, and that commit is indeed in the range @chrisbobbe identified at #4156 (comment) between when that server bug was introduced and fixed! So your server is indeed currently affected by that bug. When the bug was live on our development server chat.zulip.org, it was observed to cause these same symptoms.

Closing this issue now because I believe we've completed debugging the issue, and the code is fixed in master. As I mentioned above, there are things we can change that will mitigate future issues like this one, and I've filed #4186 for the biggest of those. Do let us know how the upgrade goes, and if you run into new problems please let us know about those too.

@gnprice gnprice closed this as completed Jul 9, 2020
@gnprice gnprice changed the title Message list loads forever, except when very few messages Message list loads forever when some messages have reactions Jul 9, 2020
@pohutukawa
Copy link
Author

Just to confirm: Upgraded to the 3.0 version, and things are working again.

Having said that, there was an issue where I had to do a Git upgrade to the 3.0 tag, as the "latest" tar-ball would lead to an error in the upgrade. There is also another issue as I wasn't able to use the data export before the upgrade or the stream_stats operation. But I'll create another ticket for that.

Awesome @gnprice, much appreciated and thanks for an awesome open source software experience!

@gnprice
Copy link
Member

gnprice commented Jul 23, 2020

Excellent. Thanks @pohutukawa for confirming, and for being an awesome bug reporter!

Please do file that upgrade issue too (in zulip/zulip) -- we'll probably do a 3.1 release within the next couple of weeks, and this sort of upgrade issue is the perfect kind of thing to fix for that release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants