Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

We are getting a HTTP2::Error::StreamLimitExceeded exception on connection.push_async(push) #29

Closed
amitsaxena opened this issue Sep 30, 2016 · 14 comments
Labels

Comments

@amitsaxena
Copy link

We were using the async method to send notifications and while sending it to around 40K devices on a single connection we got the HTTP2::Error::StreamLimitExceeded exception.

Can anyone explain what this error means? Is there a limit to the amount of data I can send on a HTTP/2 connection? A part of the notifications did go out, and the remaining failed.

We were on version 0.10.0.

@ostinelli
Copy link
Owner

As the error states, you are sending too many streams and you have exceeded the maximum amount of streams specified by the peer. Think of it like the HTTP/2 equivalent of a HTTP/1.1 429 Too Many Requests response.

Also, I highly recommend to upgrade to the stable 1.0.1 version.

@amitsaxena
Copy link
Author

@ostinelli we have a single connection, and were sending push via it using push_async (as described in the docs). Should we add some kind of timeouts in between (say every 1000 pushes)? What's a good approach to send notifications to a high number of devices?

Thanks for the help, and the gem!

@ostinelli
Copy link
Owner

ostinelli commented Sep 30, 2016

Try to understand when you hit a limit. This will give you better clarity on your strategy. I've seen a limit hit around 40k concurrent streams on a single connection, but I haven't looked for official documentation from Apple regarding it.

You can either send pushes in smaller batches and pause between them (as you suggest), or increase the connection pool that you use so that all of your pushes are diluted into more connections, hence every connection has less chance to hit the limit.

@amitsaxena
Copy link
Author

We saw this error again today on production while sending it to just 1000 devices. We chunk our devices into sets on 1K each, and sleep for 2 seconds after each chunk, and we still got this error for the first 1K devices.

May be having some kind of fail-safe (reconnect on this exception) in apnotic for this might be a good idea, else async push is almost unusable if it fails for such a small number of device tokens. As a temporary workaround we now send to 100 devices at a time (and 2 s sleep), but this has resulted in very long running sidekiq jobs.

Does anyone else use async push on production? Is doing sync pushes the recommended way?

@ostinelli ostinelli reopened this Nov 18, 2016
@ostinelli
Copy link
Owner

ostinelli commented Nov 18, 2016

May be having some kind of fail-safe (reconnect on this exception) in apnotic for this might be a good idea

This raises other questions though.

Would retries be at:

  • Apnotic level, i.e. only for certain responses (like the one you're suggesting here).
  • NetHttp2 level, i.e. also covering the case where the connection fails.

If the latter, what should we do with requests that might (or not) have sent data but have not received a response back? Should we consider them as incomplete and retry? Wouldn't this though perform, in certain conditions, twice the same request, thus even for non-idempotent requests?

@benubois
Copy link
Collaborator

This is somewhat related to #28. I don't understand HTTP/2 well enough to know if this makes sense.

As I understand it a stream is supposed to be "complete" at a certain point. Maybe this is when a request and response are sent along with an END_STREAM. at this point the stream should be closed and reduce the count of open streams. Not sure if this happens automatically or if the client needs to send RST_STREAM.

If net-http2 kept track of MAX_CONCURRENT_STREAMS and then once this number was hit would block until a stream closed or the server sent a new MAX_CONCURRENT_STREAMS it would never exceed the number of streams.

It sounds like this is what golang is considering or what they currently do which is to open a new connection when MAX_CONCURRENT_STREAMS is exceeded.

@idyll
Copy link

idyll commented Nov 21, 2016

I am not sure that HTTP2::Error::StreamLimitExceeded is a bad thing. The http-2 gem keeps track of the MAX_CONCURRENT_STREAMS and it is throwing that exception to ensure Apnotic always stays within the max.

We deliver 30-40K pushes every hour (most on the hour) and we just allow the exception to trigger the Sidekiq retry logic. I don't think I have seen more than a dozen retries per batch. Using Sidekiq in that way keeps the notification "queue" or "buffer" outside of Apnotic. I think this is a good thing because you'd need complicated logic to track notification state in the any internal buffer so that you'd know if a notification was actually sent in the event of a crash or error.

Opening a new connection when you hit "MAX_CONCURRENT_STREAMS" will result in an issue for the JWT style push because MAX_CONCURRENT_STREAMS starts out a 1 and then increases after the first push. I know this isn't an immediate problem but it needs to be in the back of our minds.

@amitsaxena
Copy link
Author

amitsaxena commented Nov 21, 2016

We deliver 30-40K pushes every hour (most on the hour) and we just allow the exception to trigger the Sidekiq retry logic

@idyll the problem with that is multiple notifications being sent to same device. We use async push, open a connection, send push to all registered devices and then close the connection. So at the point when exception is encountered, push has already been sent to quite a few devices.

@ostinelli yes the delta thing (the pushes which have been sent and response isn't received yet, before the exception) is tricky. I believe it will be the same at NetHttp2 level, unless there a way using which we can wait for all the responses before raising the HTTP2::Error::StreamLimitExceeded exception, and no new request is sent during this duration.

@idyll
Copy link

idyll commented Nov 21, 2016

@amitsaxena your setup sounds different then mine. We track things in a way that we'd know the messages were already delivered with the async push. And the resumption would start where the error occurred.

@ostinelli
Copy link
Owner

That's how we do it too.

@amitsaxena
Copy link
Author

amitsaxena commented Nov 22, 2016

@idyll would you mind sharing your setup? We cannot use a pool of persistent connections (as described in docs) as we need to send notifications to different apps (100s of them), that's why we open a connection, send notifications, and then close it.

If nothing else works, we might use sync pushes. They might be slow, but I believe will be reliable given that it sends one notification at a time, and therefore we will not hit this error.

@idyll
Copy link

idyll commented Nov 22, 2016

We cannot use a pool of persistent connections (as described in docs) as we need to send notifications to different apps (100s of them)

I do this with pools. I create a hash of pools where the key is the "topic". (where by topic I mean topic of your push certificate.) When a notification is created I store the topic, the push token and the content of the notification in Redis. Sidekiq workers grab the notifications and deliver them.

It's super important to avoid your DB with these workers if you want to achieve any sort of volume. So try to push all the data into redis that you need to send the notification without hitting the DB again.

If a notification isn't deliverable there's a separate queue to handle archiving the device.

@amitsaxena
Copy link
Author

amitsaxena commented Nov 28, 2016

Interesting approach! Just that it's not practical for us to have persistent connections given that the total number of apps is high, and majority of them rarely use push notifications service.

@ostinelli
Copy link
Owner

Closing due to no activity. If additional input is needed, can re-open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants