Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stream stopped after: Operational disconnect #462

Closed
edsu opened this issue May 17, 2021 · 7 comments
Closed

stream stopped after: Operational disconnect #462

edsu opened this issue May 17, 2021 · 7 comments

Comments

@edsu
Copy link
Member

edsu commented May 17, 2021

I was running a twarc2 stream for a few days and noticed it terminated with this error message and failed to reconnect:

{
  "errors": [
    {
      "title": "operational-disconnect",
      "disconnect_type": "OperationalDisconnect",
      "detail": "This stream has been disconnected for operational reasons.",
      "type": "https://api.twitter.com/2/problems/operational-disconnect"
    }
  ],
  "__twarc": {
    "url": "https://api.twitter.com/2/tweets/search/stream?expansions=author_id%2Cin_reply_to_user_id%2Creferenced_tweets.id%2Creferenced_tweets.id.author_id%2Centities.mentions.username%2Cattachments.poll_ids%2Cattachments.media_keys%2Cgeo.place_id&user.fields=created_at%2Cdescription%2Centities%2Cid%2Clocation%2Cname%2Cpinned_tweet_id%2Cprofile_image_url%2Cprotected%2Cpublic_metrics%2Curl%2Cusername%2Cverified%2Cwithheld&tweet.fields=attachments%2Cauthor_id%2Ccontext_annotations%2Cconversation_id%2Ccreated_at%2Centities%2Cgeo%2Cid%2Cin_reply_to_user_id%2Clang%2Cpublic_metrics%2Ctext%2Cpossibly_sensitive%2Creferenced_tweets%2Creply_settings%2Csource%2Cwithheld&media.fields=duration_ms%2Cheight%2Cmedia_key%2Cpreview_image_url%2Ctype%2Curl%2Cwidth%2Cpublic_metrics&poll.fields=duration_minutes%2Cend_datetime%2Cid%2Coptions%2Cvoting_status&place.fields=contained_within%2Ccountry%2Ccountry_code%2Cfull_name%2Cgeo%2Cid%2Cname%2Cplace_type",
    "version": "2.0.12",
    "retrieved_at": "2021-05-14T15:12:06+00:00"
  }
}
@SamHames
Copy link
Contributor

That error message seems to be associated with clients that are consuming the data too slowly: https://developer.twitter.com/en/support/twitter-api/error-troubleshooting#operational-disconnect

Out of interest were you looking at anything likely to be especially bursty or high volume?

The easier thing to do would be to check for that specific error in the streaming code and reconnect automatically. If it's something that is coming up frequently we might need to look at moving the streaming endpoint processing to it's own background thread, but that's a bit more involved.

@edsu
Copy link
Member Author

edsu commented May 24, 2021

It might have been high volume. It does seem like an easy fix for now would be to log it and reconnect.

@SamHames
Copy link
Contributor

SamHames commented May 24, 2021

This is explicitly handled in WIP form on #468.

One curious thing to me is why we need to handle this case specially - from reading the code I would have thought the request to the API would be closed on the Twitter side or timeout on the client side, which would just lead to an immediate attempt to reconnect...

@igorbrigadir
Copy link
Contributor

An aside, this is an excellent idea for recovering from stream failures: https://twittercommunity.com/t/filtered-stream-request-breaks-in-5-min-intervals/153926/9?u=igorbrigadir

For a really robust system, one where you never miss any data, you should do the following whenever you are forced to reconnect:

  • Record the ID of the last tweet received before the interruption.
  • Record the ID of the first tweet received after the interruption.
  • Spawn a new thread that uses search to read in tweets with IDs between the “before” and “after” IDs.

@edsu
Copy link
Member Author

edsu commented Jun 12, 2021

I think this would be nice to have (as an option). I think logging the error and reconnecting is a good start.

@edsu
Copy link
Member Author

edsu commented Jun 12, 2021

One curious thing to me is why we need to handle this case specially - from reading the code I would have thought the request to the API would be closed on the Twitter side or timeout on the client side, which would just lead to an immediate attempt to reconnect...

I agree, I would have thought the connection drop would have caused twarc to reconnect. But perhaps it is closing in a new way that the decorators don't have covered? At the moment I think twarc2.get is just protected against connection reset and timeout exceptions? Can't the server cleanly close the connection right without causing an exception?

I propose we merge the fix for now and add the idea for retrieving lost tweets as an enhancement.

@edsu
Copy link
Member Author

edsu commented Jun 12, 2021

This fix was released in v2.1.6.

@edsu edsu closed this as completed Jun 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants