Skip to content

Commit

Permalink
chapter 6: connectivity problems emerges
Browse files Browse the repository at this point in the history
  • Loading branch information
philsturgeon committed Apr 7, 2023
1 parent 5015809 commit 0e0a88b
Show file tree
Hide file tree
Showing 59 changed files with 194 additions and 135 deletions.
229 changes: 142 additions & 87 deletions 06-connection-problems.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -66,42 +66,67 @@ ISP (Internet Service Provider) blocked a domain, or a keyword in the request/re
Squirrels attacked the data center?

[quote,Rich Miller,"Surviving Electric Squirrels and UPS Failures, 2012, Data Center Knowledge"]
--
"A frying squirrel took out half of our Santa Clara data center two years back," Christian said, noting squirrels' propensity to interact with electrical equipment, with unfortunate results.
If you enter “squirrel outage” in either Google News or Google web search, you'll find a lengthy record of both recent and historic incidents of squirrels causing local power outages.
--

Ships trash an undersea cables by dropping anchor right on The Internet?!

// TODO Stock photo liven it up a bit? https://www.istockphoto.com/photo/underwater-fiber-optic-cable-on-ocean-floor-gm1362710800-434533439

== Coding Defensively
== Defensive Code

Going over the wire is fraught with danger, and it gets worse the farther a
request and response have to travel. A common theme "in this book will be: avoid
making network calls when possible, expect failure, and make sure your user
interface degrades to vary levels, so failures are clear. No white screens of
death!
All of these problems are going to cause the happy path to get messy.

=== Detecting Success or Failure
Let's harden our code one step at a time.

Failed to connect? Try it again, and if it fails a few times maybe show
something to the user explaining that their Internet is down.
[sidebar]
Inspiration for these code examples was taken from Umar Hansa's brilliant article https://web.dev/fetch-api-error-handling/[Implement error handling when using the Fetch API].

It's important to make sure that no single part of any client
application _requires_ a connection to leave that state. Often I have
seen client applications submit a form, hide the form they just
submitted, fail to make the connection, and as they were expecting a
positive or negative JSON response in a specific structure, in order to
dictate showing the form again or progressing, they end up with a blank
screen.
[,js]
----
include::code/ch06-connection-problems/02-catch-fetch-errors.js[]
----

This little change solves some of these problems. If there is any sort of connection failure from something like a connection refused (no internet, server is down, etc), or a dropped connection (failed part way through), or certificate errors, this should all be caught with the first exception.

Whatever happens, it will log something to the user console, and return early. You could imagine this code doing something clever to update the user interface, but for now we're keeping it simple.

This is a step in the right direction, but once we've eventually got a response there is a lot of other things that can go wrong. What if the response is randomly HTML instead of JSON? Or it's weirdly invalid JSON?

[,js]
----
include::code/ch06-connection-problems/03-catch-json-errors.js[]
----

Great! Now when the API randomly squirts some unexpected HTML error at you, the function will just return an empty array, and there is an error logged that the developers can go digging into.

Another step in the right direction, but this still assumes we actually get a response in a reasonable timeframe.

What if you've been waiting for *thirty seconds*?

What if you've been waiting for *two minutes*?

We will deep dive into timeouts later on in the book, but a really helpful quick bit of defensive coding you can do, is to make sure your application isn't spending two minutes doing absolutely nothing for a request that normally takes less than half a second.

[,js]
----
include::code/ch06-connection-problems/03-catch-json-errors.js[]
----

== Simulating Network Nonsense

Most of the time developing against an API that works just fine means you cannot test these complicated unhappy paths.

Timeouts are also a concern, but more on those later.
To simulate the sort of nonsense you are coding to defend against, take a look at https://github.com/Shopify/toxiproxy[Toxiproxy] by Shopify.

== Rate Limiting

Another common situation to run into is rate limiting: the API telling
you to calm down a bit, and slow down how many requests are being made
in a certain timeframe. The most basic rate limiting strategy is often
"clients can only send X requests per second."
Another common situation to run into is rate limiting, which is basically the
API telling your API client to calm down a bit, and slow down how many requests
are being made. The most basic rate limiting strategy is
often "clients can only send X requests per second."

Many APIs implement rate limiting to ensure relative stability when
unexpected things happen. If for some reason one client causes a spike
Expand Down Expand Up @@ -159,7 +184,8 @@ processes handling 5 requests per second each, but you get the idea.
If this process was being implemented in NodeJS, you could use
https://www.npmjs.com/package/bottleneck[Bottleneck].

....
[source,js]
--
const Bottleneck = require("bottleneck");

// Never more than 5 requests running at a time.
Expand All @@ -176,7 +202,7 @@ const fetchPokemon = id => {
limiter.schedule(fetchPokemon, id).then(result => {
/* ... */
})
....
--

Ruby users who are already using tools like Sidekiq can add plugins like
https://github.com/sensortower/sidekiq-throttled[Sidekiq::Throttled], or
Expand All @@ -195,32 +221,36 @@ the API might lower its limits for some reason.
=== Am I Being Rate Limited?

The appropriate HTTP status code for rate limiting has been argued over
about as much as tabs vs spaces, but there is a clear winner now;
https://tools.ietf.org/html/rfc6585[RFC 6585] defines it as 429, so APIs
should be using 429.
about as much as "tabs" versus "spaces", but there is a clear winner now;
https://tools.ietf.org/html/rfc6585[RFC 6585] defines it as HTTP 429.

image::images/429.jpg[]
.http.cat meme for HTTP 429
image::images/429.jpg[Lots of cats,500,align="center"]

Twitter's API existed for a few years before this standard, and they
chose "420 - Enhance Your Calm". They've dropped this and moved over to
429, but some others copied them at the time, and might not have updated
since. You cannot rule out bumping into a copycat API, still using that
outdated unofficial status.
Some APIs like Twitter's old API existed for a few years before this standard,
and they chose "420 - Enhance Your Calm". Twitter has dropped 420 and got on
board with the standard 429. Unfortunately some APIs replicated that and have
not yet switched over to using the standard, so you might see either a 429 or
this slow copycat.

image::images/420.jpg[]
.http.cat meme for HTTP 420
image::images/420.jpg[Cat chewing on a cannabis leaf,500,align="center"]

Google also got a little "creative" with their status code utilization. For a
long time were using 403 for their rate limiting, but I have no idea if they are
long time were using 403 for their rate limiting, but I don't know if they are
still doing that. Bitbucket are still using 403 in their Server REST API.

// TODO Confirm if google are still doing that.

[quote,REST Resources Provided By: Bitbucket Server,https://docs.atlassian.com/bitbucket-server/rest/5.12.3/bitbucket-rest.html]
____
Actions are usually "forbidden" if they involve breaching the licensed user limit of the server, or degrading the authenticated user's permission level. See the individual resource documentation for more details.
____

GitHub v3 API has a 403 rate limit too:

....
[source]
----
HTTP/1.1 403 Forbidden
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0
Expand All @@ -229,82 +259,107 @@ X-RateLimit-Reset: 1377013266
"message": "API rate limit exceeded for xxx.xxx.xxx.xxx. (But here's the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.)",
"documentation_url": "https://developer.github.com/v3/#rate-limiting"
}
....
----

Getting a 429 (or a 420) is a clear indication that a rate limit has
been hit, and a 403 combined with an error code, or maybe some HTTP
headers can also be a thing to check for. Either way, when you're sure
it's a rate limit error, you can move onto the next step: figuring out
how long to wait before trying again.
headers can also be a thing to check for.

=== Proprietary Headers

Github here are using some proprietary headers, all beginning with
`X-RateLimit-`. These are not at all standard (you can tell by the
`X-`), and could be very different from whatever API you are working
with.

Successful requests with Github here will show how many requests are
remaining, so maybe keep an eye on those and try to avoid making
requests if the remaining amount on the last response was 0.
Either way, when you're sure it's a rate limit error, you can move onto the next
step: figuring out how long to wait before trying again.

....
curl -i https://api.github.com/users/octocat
HTTP/1.1 200 OK
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 56
X-RateLimit-Reset: 1372700873
....
There are three main ways a server might communicate retry logic to you.

You can use a shared key (maybe in Redis or similar) to track that, and
have it expire on the reset provided in
http://en.wikipedia.org/wiki/Unix_time[UTC time] in `X-RateLimit-Reset`.
==== Retry-After Header

=== Retry-After
The Retry-After header is a handy standard way to communicate "this didn't work
now, but it might work if you retry in <the future>".

According to the RFCs for HTTP/1.1 (the obsoleted and irrelevant RFC
2616, and the replacement RFC 9110), the header
https://www.rfc-editor.org/rfc/rfc9110#field.retry-after[Retry-After] is only
for 503 server errors, and maybe redirects. Luckily
https://tools.ietf.org/html/rfc6585[RFC 6584] (the same one which added
HTTP status code 429) says it's totally cool for APIs to use
`Retry-After` there.
```
Retry-After: <http-date>
Retry-After: <delay-seconds>
```

So, instead of potentially infinite proprietary alternatives, you should
start to see something like this:
The logic for how it works is defined in https://tools.ietf.org/html/rfc6585[RFC 6584] (the same RFC that introduced HTTP 429) but basically it might look a bit like this:

....
[source]
----
HTTP/1.1 429 Too Many Requests
Retry-After: 3600
Retry-After: 60
Content-Type: application/json
{
"message": "API rate limit exceeded for xxx.xxx.xxx.xxx.",
"documentation_url": "https://developer.example.com/#rate-limiting"
"error": {
"message": "API rate limit exceeded for xxx.xxx.xxx.xxx.",
"link": "https://developer.example.com/#rate-limiting"
}
}
....
----

An alternative value for Retry-After is an HTTP date:
You might also see a `Retry-After` showing you an HTTP date:

....
Retry-After: Wed, 21 Oct 2015 07:28:00 GMT
....
[source]
----
Retry-After: Sat, 15 April 2023 07:28:00 GMT
----

Same idea, it just tells the client to wait until then before bothering
the API further.
Same idea, it's just saying "please don't come back before this time".

By checking for these errors, you can catch then retry (or re-queue)
requests that have failed, or if thats not an option try sleeping for a
By checking for these errors, you can catch and retry (or re-queue)
requests that have failed. If that is not an option try sleeping for a
bit to calm workers down.


WARNING: Make sure your sleep does not block your background
processes from processing other jobs. This can happen in languages where
sleep sleeps the whole process, and that process is running multiple
types job on the same thread. Don't back up your whole system with an
overzealous sleep!_

Faraday, a ruby gem I work with often, is
https://github.com/lostisland/faraday/pull/773[now aware of
Retry-After]. It uses the value to help calculate the interval between
retry requests. This can be useful for anyone considering implementing
rate limiting detection code, even if you aren't a Ruby fan.
Some HTTP clients like Faraday are
https://github.com/lostisland/faraday/pull/773[aware of Retry-After] and use it
to power their build in retry logic, but other HTTP clients might need some
training.

// TODO: Code example


==== Proprietary Headers

Some APIs like GitHub v3 use proprietary headers, all beginning with
`X-RateLimit-`. These are not at all standard (you can tell by the
`X-`), and could be very different from whatever API you are working
with.

Successful requests with Github here will show how many requests are
remaining, so maybe keep an eye on those and try to avoid making
requests if the remaining amount on the last response was 0.

[source]
----
$ curl -i https://api.github.com/users/octocat
HTTP/1.1 200 OK
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 56
X-RateLimit-Reset: 1372700873
----

You can use a shared key (maybe in Redis or similar) to track that, and
have it expire on the reset provided in
http://en.wikipedia.org/wiki/Unix_time[UTC time] in `X-RateLimit-Reset`.


==== RateLimit Headers (Standard Draft)

The benefit of the proprietary headers is that you get a lot more information to work with, letting you know you're approaching a limit so you can pre-emptively back off, instead of waiting to stand on that rake then having to respond after being hit round the head.

There's an IETF RFC draft called https://datatracker.ietf.org/doc/draft-ietf-httpapi-ratelimit-headers/[RateLimit header fields for HTTP] that aims to give you the best of both, and maybe you'll fun into something that resembles this in the distant future of 2024 or 2025.

----
RateLimit-Limit: 100
RateLimit-Remaining: 50
RateLimit-Reset: 50
----

This says there is a limit of 100 requests in the quota, the client has 50 remaining, and it will reset in 50 seconds. Handy!
Loading

0 comments on commit 0e0a88b

Please sign in to comment.