Improve upload and download API docs #2955

willkg · 2024-07-03T17:23:39Z

This improves the HTTP status codes, adds response headers where applicable, and adds a first-pass section on improving symbol upload success rates.

This should help users who are using the download and upload APIs know what to do in certain status code situations that were previously undocumented.

This specifies the response headers so we can write systemtests for them.

This will help people writing symbol upload jobs with a set of things they can look at to improve the likelihood those jobs finish successfully. We can hone this section over time.

For whoever reviews this, feel free to merge it if it's approved. Thank you!

willkg · 2024-07-03T17:25:32Z

docs/download.rst

+   :resheader Content-Length: length in bytes for the file
+   :resheader Content-Encoding: (optional) content encoding for the file; note
+       that ``.sym`` files are compressed even thought he file extension doesn't
+       indicate that


I think these are the critical response headers. I'm a little fuzzy on HTTP 302s, though. Does this sound right? Are there other headers we want to include here?

Only the Location header is in the response from Tecken. The other headers are part of the response from S3. We should make that clear in this list.

The 302 response from Tecken will have these headers:

Location – the redirect URL

Debug-Time – the time if Debug:true was in the request. (I'm unsure we really want this to be part of the documented interface)

Content-Length: 0 – a redirect response doesn't have a body. We don't need to document this.

Content-Type: text/html – Django seems to be putting this there by default. It doesn't really make sense.

There's no Content-Encoding header in the Tecken response. There's no content, so specifying an encoding would be pointless.

If we follow the redirect, we talk directly to S3. The response include these headers:

Content-Encoding: gzip – If the file is compressed. In general, the file should only be returned with this content encoding if the client indicated that it can deal with it with Accept-Encoding: gzip. However, S3 will always return it if the file was uploaded with this encoding. (GCS actually decompresses the file on the server side if the client doesn't indicate Accept-Encoding: gzip.) A content encoding is only used for transport and should be transparent to the end user. You get the "actual" content after decoding. For example you can navigate to https://s3.us-west-2.amazonaws.com/org.mozilla.crash-stats.symbols-public/v1/xpcshell/4C4C44B655553144A1B2DAA18738D74B0/xpcshell.sym in Firefox, and you will see the text content. The file is still stored and transmitted compressed, but the Content-Encoding header indicates to Firefox that the file should be decompressed first. In curl you can use curl --compressed to get the same behaviour; curl will list all the content encodings it supports in the Accept-Encoding header, and if the response indicates that one of these encodings was used the content will be decoded accordingly.

Content-Length: The length of the response body. If the body is compressed, it's the size of the compressed body.

Content-Type: The content type of the response after decompressing it. Will be text/plain for symbol files.

I'm not actually sure whether this response makes things clearer or more confusing, but I'll send it anyway. The important point is that we should make clear what headers are in Tecken's response, and what headers are in S3's response, if we want to list the latter as well.

I think the fact that it's coming from AWS is an implementation detail that isn't relevant to the docs.

However, I do see your point about which headers come in which response. I'll have to think about how to document that clearly. I'll think about this more when I get back from PTO.

We also need to think about the subtle difference in behaviour between S3 and GCS. Maybe we can add the Accept-Encoding: gzip header at the load balancer level in GCS to make sure the behviour is the same, but I'll have to look into it.

I commented on the GCS implementation bug about this.

smarnach · 2024-07-03T19:16:14Z

docs/download.rst

   :statuscode 500: sleep for a bit and retry; if retrying doesn't work, then please
       file a bug report
-   :statuscode 503: sleep for a bit and retry
+   :statuscode 502: sleep for a bit and retry


Why 502 instead of 503?

I checked the logs and we don't really emit 503s as far as I can tell. We do emit 502s, though.

The 503s are emitted by the ELB. We don't have load balancer logs for Tecken, so we can't see 503s in the logs. I expect 503s to be far more common than 502s, but I don't think we have any way to verify that.

Why do you expect 503s to be more common than 502s for Tecken?

I got these status codes mixed up. 504s are probably more common than 502s. The ELB will emit a 504 if the backends are overloaded, or when requests take too long. 503s can happen as well if all instances are unhealthy, which sometimes happens for Tecken.

We actually do have some numbers to esitmate how common 500s from the ELB are in general, compared to 500s from the backend, since the ELB stores aggregate numbers for these two classes of 500s. I would have expected that the vast majority of 500s come from the ELB, but it turns out it's only a slight majority – about 60% of all 500s come from the ELB and 40% from the backend.

smarnach · 2024-07-03T19:51:00Z

docs/download.rst

+   :resheader Content-Length: length in bytes for the file
+   :resheader Content-Encoding: (optional) content encoding for the file; note
+       that ``.sym`` files are compressed even thought he file extension doesn't
+       indicate that


Only the Location header is in the response from Tecken. The other headers are part of the response from S3. We should make that clear in this list.

The 302 response from Tecken will have these headers:

Location – the redirect URL

Debug-Time – the time if Debug:true was in the request. (I'm unsure we really want this to be part of the documented interface)

Content-Length: 0 – a redirect response doesn't have a body. We don't need to document this.

Content-Type: text/html – Django seems to be putting this there by default. It doesn't really make sense.

There's no Content-Encoding header in the Tecken response. There's no content, so specifying an encoding would be pointless.

If we follow the redirect, we talk directly to S3. The response include these headers:

Content-Encoding: gzip – If the file is compressed. In general, the file should only be returned with this content encoding if the client indicated that it can deal with it with Accept-Encoding: gzip. However, S3 will always return it if the file was uploaded with this encoding. (GCS actually decompresses the file on the server side if the client doesn't indicate Accept-Encoding: gzip.) A content encoding is only used for transport and should be transparent to the end user. You get the "actual" content after decoding. For example you can navigate to https://s3.us-west-2.amazonaws.com/org.mozilla.crash-stats.symbols-public/v1/xpcshell/4C4C44B655553144A1B2DAA18738D74B0/xpcshell.sym in Firefox, and you will see the text content. The file is still stored and transmitted compressed, but the Content-Encoding header indicates to Firefox that the file should be decompressed first. In curl you can use curl --compressed to get the same behaviour; curl will list all the content encodings it supports in the Accept-Encoding header, and if the response indicates that one of these encodings was used the content will be decoded accordingly.

Content-Length: The length of the response body. If the body is compressed, it's the size of the compressed body.

Content-Type: The content type of the response after decompressing it. Will be text/plain for symbol files.

I'm not actually sure whether this response makes things clearer or more confusing, but I'll send it anyway. The important point is that we should make clear what headers are in Tecken's response, and what headers are in S3's response, if we want to list the latter as well.

biancadanforth · 2024-07-15T17:55:13Z

docs/download.rst

+   :statuscode 429: your request has been rate-limited; sleep for a bit and retry
+   :statuscode 500: there's an error with the server; sleep for a bit and
+       retry; if retrying doesn't work, then please file a bug report
+   :statuscode 502: sleep for a bit and retry


I checked the last week of logs for Tecken prod for download requests, and this list looks good to me (i.e. I couldn't find any other status codes to include).

The logs are from nginx, so they only include requests that make it to nginx. If the ELB can't reach the backend, it will reply with a 503 or 504, and these requests won't be in the logs. It's possible to enable ELB logs as well, but we never did that for a variety of reasons. In GCP, we'll also have load balancer logs, so we'll be able to see all the requests.

biancadanforth · 2024-07-15T18:11:01Z

docs/upload.rst

+       retry; if retrying doesn't work, then please file a bug report
+   :statuscode 502: sleep for a bit and retry
+   :statuscode 504: the request is taking too long to complete; sleep for a bit
+       and retry


I also found in the Tecken prod logs over the last week 29 occurrences (0.3% of all response codes) of a 499 response code. Interestingly, this is a non-standard code (not in MDN), but it appears to be a code used by nginx to indicate the client closed the connection before the server could send a response. I couldn't find this in nginx's docs, but I saw a bunch of articles about it with an organic web search.

That's not a response code clients will ever see, though. It's never actually sent by nginx, since the client already closed the connection. It's only recorded in the logs. We don't need to document this since it's not visible from the outside.

willkg · 2024-07-18T17:44:26Z

Update: I really appreciate all the comments and thoughts here. Thank you!

I want to address the comments, but probably won't have time to do that until next week. I also think that while updating docs is important, the discussion this kicked off was invaluable and several issues were created to fix tests and other things. Actually updating the docs is less important than that work was.

willkg · 2024-08-14T15:16:11Z

^^^ That updates the PR to factor in the comments from everyone. Does this match what we've got in AWS and what we're building in GCP now? Is it clear for users?

This improves the HTTP status codes, adds response headers where applicable, and adds a first-pass section on improving symbol upload success rates. This should help users who are using the download and upload APIs know what to do in certain status code situations that were previously undocumented. This specifies the response headers so we can write systemtests for them. This will help people writing symbol upload jobs with a set of things they can look at to improve the likelihood those jobs finish successfully. We can hone this section over time.

This fixes the location redirect download API docs by moving the separate request into a new section. This adds status code items for 502, 503, and 504 where missing.

willkg · 2024-10-30T00:38:06Z

@smarnach , @biancadanforth: ^^^ does that look ok?

biancadanforth

LGTM! Had one question about one of the statements.

biancadanforth · 2024-10-31T15:27:45Z

docs/upload.rst

+If you find your job is getting HTTP 429s or 504s frequently or it doesn't seem
+like symbol uploads are being completed, try these tips:


Should 504s be 502s, 503s and 504s per the previous discussion and additions to the HTTP codes? IIUC the reasons for these codes could affect both the download and upload APIs.

willkg · 2024-11-01T11:52:32Z

^^^ That clarifies the situations where the tips are helpful.

willkg · 2024-11-01T11:52:40Z

Thank you!

willkg requested a review from a team as a code owner July 3, 2024 17:23

willkg commented Jul 3, 2024

View reviewed changes

willkg force-pushed the improve-docs branch from f98f10f to 528787e Compare July 3, 2024 17:58

smarnach reviewed Jul 3, 2024

View reviewed changes

biancadanforth reviewed Jul 15, 2024

View reviewed changes

willkg force-pushed the improve-docs branch from 528787e to fb6a901 Compare August 14, 2024 15:15

willkg added 2 commits October 29, 2024 20:37

Address PR comments

1013d9b

This fixes the location redirect download API docs by moving the separate request into a new section. This adds status code items for 502, 503, and 504 where missing.

willkg force-pushed the improve-docs branch from fb6a901 to 1013d9b Compare October 30, 2024 00:37

biancadanforth approved these changes Oct 31, 2024

View reviewed changes

Clarify situations where tips would be helpful

17f1f2a

willkg added this pull request to the merge queue Nov 1, 2024

Merged via the queue into main with commit 99a1898 Nov 1, 2024
2 checks passed

willkg deleted the improve-docs branch November 1, 2024 11:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve upload and download API docs #2955

Improve upload and download API docs #2955

willkg commented Jul 3, 2024 •

edited

Loading

willkg Jul 3, 2024

smarnach Jul 3, 2024

willkg Jul 3, 2024 •

edited

Loading

smarnach Jul 4, 2024

smarnach Jul 4, 2024

smarnach Jul 3, 2024

willkg Jul 3, 2024

smarnach Jul 8, 2024

willkg Jul 15, 2024

smarnach Jul 16, 2024

smarnach Jul 3, 2024

biancadanforth Jul 15, 2024

smarnach Jul 15, 2024

biancadanforth Jul 15, 2024

smarnach Jul 15, 2024

willkg commented Jul 18, 2024

willkg commented Aug 14, 2024

willkg commented Oct 30, 2024

biancadanforth left a comment

biancadanforth Oct 31, 2024

willkg commented Nov 1, 2024

willkg commented Nov 1, 2024

		If you find your job is getting HTTP 429s or 504s frequently or it doesn't seem
		like symbol uploads are being completed, try these tips:

Improve upload and download API docs #2955

Improve upload and download API docs #2955

Conversation

willkg commented Jul 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

willkg Jul 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

willkg commented Jul 18, 2024

willkg commented Aug 14, 2024

willkg commented Oct 30, 2024

biancadanforth left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

willkg commented Nov 1, 2024

willkg commented Nov 1, 2024

willkg commented Jul 3, 2024 •

edited

Loading

willkg Jul 3, 2024 •

edited

Loading