-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add /api/manifest endpoint #101
Conversation
Staging instance deployed by Travis CI! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This generally looks good to me, but I'm not yet very confident with reviewing Go code in details. @mdittmer , could you take a look?
if err = json.Unmarshal(*queryResults["items"], &issues); err != nil { | ||
return nil, err | ||
} | ||
if len(issues) < 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to assert len(issues)==1 and perhaps log all anomalies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assertions should be for things that the programmer thinks is impossible, but logging which requests don't find a manifest SGTM. Except, do we have any server side logs at all currently? The less we log the less we need to think about PII and that kind of thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AppEngine has built-in logging support. We can just use the standard logging facility provided by the language, and all the logs will go to Google Cloud Stackdriver which has a fancy UI for filtering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding PII, AppEngine logs requests already. We won't and can't add extra PII if we log some error messages.
webapp/api_handlers.go
Outdated
@@ -18,6 +20,7 @@ import ( | |||
"golang.org/x/net/context" | |||
"google.golang.org/appengine" | |||
"google.golang.org/appengine/datastore" | |||
"google.golang.org/appengine/urlfetch" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, if we decide to go ahead with #107 , we won't be using urlfetch
anymore. urlfetch
isn't used extensively in this PR yet, but I still suggest creating a wrapper for urlfetch
so that we can switch to generic network I/O later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will save that pain for another day.
testRunKindName := "TestRun" | ||
passRateMetadataKindName := metrics.GetDatastoreKindName( | ||
metrics.PassRateMetadata{}) | ||
failuresMetadataKindName := metrics.GetDatastoreKindName( | ||
metrics.FailuresMetadata{}) | ||
|
||
log.Print("Adding local (empty) secrets...") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not very hard to hit the GitHub API quota. Could this try to read a token from a file, an environment variable, or something, and print a warning if none exists?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't manage to exceed it in all of my trials, so I think I'll wait until I see a problem before I solve it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, perhaps the 403s one easily gets with a few fast reloads of https://foolip.github.io/ecosystem-infra-rotation/ are a different kind of problem that you don't get with just a single request, maybe this: https://developer.github.com/v3/#abuse-rate-limits
// ParseSHAParam parses and validates the 'sha' param for the request. | ||
// It returns "latest" by default (and in error cases). | ||
// ParseSHAParam parses and validates the 'sha' param for the request, | ||
// cropping it to 10 chars. It returns "latest" by default. (and in error cases). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the cropping introduced? It looks like the longest ambiguous shas in wpt are 7 characters long (93d383f, d0b03af, e54fe8f) and it's fairly unlikely we will have a collision, but using full hashes everywhere internally and only accepting shorter ones as input seems like a good idea. But maybe this is necessary because 10-char hashes are parts of the filenames?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Luke's aware of #49 , but many existing code doesn't take full SHAs so I assume this is a transient solution. At least this endpoint now takes a full SHA (despite chopping it internally), which is good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: This PR doesn't introduce the cropping, it bypasses it (since we want to use the full hash here).
See #46 for the temporary 'fix' as Hexcles describes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh... there was already a runParam[:10]
before and this introduces a ParseSHAParamFull
. Sorry, didn't finish reading before reacting :/
webapp/api_handlers.go
Outdated
ctx := appengine.NewContext(r) | ||
if manifest, err := getManifestForSHA(ctx, sha); err != nil { | ||
http.Error(w, err.Error(), http.StatusNotFound) | ||
} else if manifest == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Letting both manifest
and err
be nil seems like an odd API, should getManifestForSHA
return an error at its final line instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
if err = json.Unmarshal(*queryResults["items"], &issues); err != nil { | ||
return nil, err | ||
} | ||
if len(issues) < 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assertions should be for things that the programmer thinks is impossible, but logging which requests don't find a manifest SGTM. Except, do we have any server side logs at all currently? The less we log the less we need to think about PII and that kind of thing.
if len(assets) < 1 { | ||
return nil, fmt.Errorf("No assets found for release %s", releaseTag) | ||
} | ||
// Get (and unzip) the asset with name "MANIFEST-{sha}.json.gz" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to unzip it, or could we set the transfer encoding and just pass along the data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We played with this a bit. The short answer is that we can't, unless (potentially) we use some dirty hacks.
Basically, GitHub responds a 302 for the download URL to a long AWS S3 URL that has a bunch of query parameters to tell S3 to force a browser download, which means S3 will disregard the Accept-Encoding
in request headers, and always return the raw octet stream of the file (which happens to be a gz here).
If we absolutely want to to do it, I guess we can intercept 302 and modify the query parameters, but I guess we don't :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to unzip it anyway, once we get to dumping it in the cache and allowing path filtering, etc.
webapp/api_handlers.go
Outdated
} | ||
} | ||
|
||
func getManifestForSHA(ctx context.Context, sha string) (manifest []byte, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some integration tests for this API? Do we have any tests that just hit the API endpoints, as opposed to invoking getManifestForSHA
as we'd do here?
And... what do we do about GitHub API reliability in such tests? Mock out the GitHub API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, somewhat. IMHO abstracting to a client
and asserting that we fetch a URL is as far as we should bother testing, otherwise you're writing tests that really only test how well you write your mocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense, thanks!
I tried this: for tag in `git tag --list 'merge_pr_1*'`; do
curl "https://manifest-api-dot-wptdashboard.appspot.com/api/manifest?sha=$(git rev-parse $tag)" > MANIFEST.$tag.json
done That's 520 tags/files. 32 of them didn't return a manifest, which is a bit much for comfort. These are the ones:
What do we do with this unreliability? Backfill releases for existing tags? Put in place some kind of monitoring going forward? |
Well, abstractly speaking, the API is an HTTP endpoint for finding the manifest for a SHA. Whether or not there actually was one available is a reliability issue with a separate service (the service for tagging PRs). Please file an issue on the WPT repo so we can discuss there :) |
Poking you on web-platform-tests/wpt#10572 |
// ParseSHAParam parses and validates the 'sha' param for the request. | ||
// It returns "latest" by default (and in error cases). | ||
// ParseSHAParam parses and validates the 'sha' param for the request, | ||
// cropping it to 10 chars. It returns "latest" by default. (and in error cases). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh... there was already a runParam[:10]
before and this introduces a ParseSHAParamFull
. Sorry, didn't finish reading before reacting :/
webapp/api_handlers.go
Outdated
} | ||
} | ||
|
||
func getManifestForSHA(ctx context.Context, sha string) (manifest []byte, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense, thanks!
}, | ||
releasesJSON["assets"].([]object)[0], | ||
} | ||
client.Responses[gitHubReleaseURL("merge_pr_123")] = unsafeMarshal(releasesJSON) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wouldn't affect the test, but checked that web-platform-tests/wpt#123 was merged very long ago so that we won't be creating a real tag called merge_pr_123 :)
Description
WIP - request for feedback
Adds an endpoint for getting a manifest for a given SHA.
To achieve this, it does:
merge_pr_[PR number]
MANIFEST-[sha].json.gz
@gsnedders @jgraham if there is a better way to do this (without a local git clone) I'd be happy to know about it
@tabatkins - Haven't done
latest
behaviour yet, needs to be given a SHA param, but it's running here if you want to play around and point out any major flaws. AppEngine seems to behave nicely if you allow gzip encoding, but defaults to raw JSON @ ~20MB.https://manifest-api-dot-wptdashboard.appspot.com/api/manifest?sha=ae34fb2b39f8b2cf9c3989a566ab938cde45dec0
Context
There are some UI features (e.g. linking to w3c-test.org) which require manifest context, so we'd like to eventually cache manifests by SHA and support lookup of manifest info for a single test file.