Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add /api/manifest endpoint #101

Merged
merged 6 commits into from
May 7, 2018
Merged

Add /api/manifest endpoint #101

merged 6 commits into from
May 7, 2018

Conversation

lukebjerring
Copy link
Contributor

Description

WIP - request for feedback

Adds an endpoint for getting a manifest for a given SHA.
To achieve this, it does:

  • GitHub search for the PR associated with the SHA
  • GitHub fetch for the release by tag-name merge_pr_[PR number]
  • GitHub download of the release's asset with the name MANIFEST-[sha].json.gz

@gsnedders @jgraham if there is a better way to do this (without a local git clone) I'd be happy to know about it

@tabatkins - Haven't done latest behaviour yet, needs to be given a SHA param, but it's running here if you want to play around and point out any major flaws. AppEngine seems to behave nicely if you allow gzip encoding, but defaults to raw JSON @ ~20MB.

https://manifest-api-dot-wptdashboard.appspot.com/api/manifest?sha=ae34fb2b39f8b2cf9c3989a566ab938cde45dec0

Context

There are some UI features (e.g. linking to w3c-test.org) which require manifest context, so we'd like to eventually cache manifests by SHA and support lookup of manifest info for a single test file.

@lukebjerring lukebjerring requested a review from foolip April 27, 2018 16:43
@lukebjerring
Copy link
Contributor Author

Staging instance deployed by Travis CI!
Running at https://manifest-api-dot-wptdashboard.appspot.com

@lukebjerring lukebjerring requested a review from Hexcles May 2, 2018 18:19
Copy link
Member

@Hexcles Hexcles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generally looks good to me, but I'm not yet very confident with reviewing Go code in details. @mdittmer , could you take a look?

if err = json.Unmarshal(*queryResults["items"], &issues); err != nil {
return nil, err
}
if len(issues) < 1 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to assert len(issues)==1 and perhaps log all anomalies.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assertions should be for things that the programmer thinks is impossible, but logging which requests don't find a manifest SGTM. Except, do we have any server side logs at all currently? The less we log the less we need to think about PII and that kind of thing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AppEngine has built-in logging support. We can just use the standard logging facility provided by the language, and all the logs will go to Google Cloud Stackdriver which has a fancy UI for filtering.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding PII, AppEngine logs requests already. We won't and can't add extra PII if we log some error messages.

@@ -18,6 +20,7 @@ import (
"golang.org/x/net/context"
"google.golang.org/appengine"
"google.golang.org/appengine/datastore"
"google.golang.org/appengine/urlfetch"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, if we decide to go ahead with #107 , we won't be using urlfetch anymore. urlfetch isn't used extensively in this PR yet, but I still suggest creating a wrapper for urlfetch so that we can switch to generic network I/O later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will save that pain for another day.

@lukebjerring lukebjerring requested a review from mdittmer May 2, 2018 22:43
testRunKindName := "TestRun"
passRateMetadataKindName := metrics.GetDatastoreKindName(
metrics.PassRateMetadata{})
failuresMetadataKindName := metrics.GetDatastoreKindName(
metrics.FailuresMetadata{})

log.Print("Adding local (empty) secrets...")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not very hard to hit the GitHub API quota. Could this try to read a token from a file, an environment variable, or something, and print a warning if none exists?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't manage to exceed it in all of my trials, so I think I'll wait until I see a problem before I solve it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, perhaps the 403s one easily gets with a few fast reloads of https://foolip.github.io/ecosystem-infra-rotation/ are a different kind of problem that you don't get with just a single request, maybe this: https://developer.github.com/v3/#abuse-rate-limits

// ParseSHAParam parses and validates the 'sha' param for the request.
// It returns "latest" by default (and in error cases).
// ParseSHAParam parses and validates the 'sha' param for the request,
// cropping it to 10 chars. It returns "latest" by default. (and in error cases).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the cropping introduced? It looks like the longest ambiguous shas in wpt are 7 characters long (93d383f, d0b03af, e54fe8f) and it's fairly unlikely we will have a collision, but using full hashes everywhere internally and only accepting shorter ones as input seems like a good idea. But maybe this is necessary because 10-char hashes are parts of the filenames?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Luke's aware of #49 , but many existing code doesn't take full SHAs so I assume this is a transient solution. At least this endpoint now takes a full SHA (despite chopping it internally), which is good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This PR doesn't introduce the cropping, it bypasses it (since we want to use the full hash here).

See #46 for the temporary 'fix' as Hexcles describes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh... there was already a runParam[:10] before and this introduces a ParseSHAParamFull. Sorry, didn't finish reading before reacting :/

ctx := appengine.NewContext(r)
if manifest, err := getManifestForSHA(ctx, sha); err != nil {
http.Error(w, err.Error(), http.StatusNotFound)
} else if manifest == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Letting both manifest and err be nil seems like an odd API, should getManifestForSHA return an error at its final line instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

if err = json.Unmarshal(*queryResults["items"], &issues); err != nil {
return nil, err
}
if len(issues) < 1 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assertions should be for things that the programmer thinks is impossible, but logging which requests don't find a manifest SGTM. Except, do we have any server side logs at all currently? The less we log the less we need to think about PII and that kind of thing.

if len(assets) < 1 {
return nil, fmt.Errorf("No assets found for release %s", releaseTag)
}
// Get (and unzip) the asset with name "MANIFEST-{sha}.json.gz"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to unzip it, or could we set the transfer encoding and just pass along the data?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We played with this a bit. The short answer is that we can't, unless (potentially) we use some dirty hacks.

Basically, GitHub responds a 302 for the download URL to a long AWS S3 URL that has a bunch of query parameters to tell S3 to force a browser download, which means S3 will disregard the Accept-Encoding in request headers, and always return the raw octet stream of the file (which happens to be a gz here).

If we absolutely want to to do it, I guess we can intercept 302 and modify the query parameters, but I guess we don't :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to unzip it anyway, once we get to dumping it in the cache and allowing path filtering, etc.

}
}

func getManifestForSHA(ctx context.Context, sha string) (manifest []byte, err error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some integration tests for this API? Do we have any tests that just hit the API endpoints, as opposed to invoking getManifestForSHA as we'd do here?

And... what do we do about GitHub API reliability in such tests? Mock out the GitHub API?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, somewhat. IMHO abstracting to a client and asserting that we fetch a URL is as far as we should bother testing, otherwise you're writing tests that really only test how well you write your mocks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, thanks!

@foolip
Copy link
Member

foolip commented May 3, 2018

I tried this:

for tag in `git tag --list 'merge_pr_1*'`; do
  curl "https://manifest-api-dot-wptdashboard.appspot.com/api/manifest?sha=$(git rev-parse $tag)" > MANIFEST.$tag.json
done

That's 520 tags/files. 32 of them didn't return a manifest, which is a bit much for comfort. These are the ones:

merge_pr_10005
merge_pr_10014
merge_pr_10113
merge_pr_10158
merge_pr_10159
merge_pr_10164
merge_pr_10180
merge_pr_10191
merge_pr_10194
merge_pr_10202
merge_pr_10205
merge_pr_10215
merge_pr_10216
merge_pr_10218
merge_pr_10219
merge_pr_10222
merge_pr_10224
merge_pr_10235
merge_pr_10237
merge_pr_10238
merge_pr_10240
merge_pr_10247
merge_pr_10267
merge_pr_10268
merge_pr_10276
merge_pr_10278
merge_pr_10280
merge_pr_10286
merge_pr_10294
merge_pr_10445
merge_pr_10605
merge_pr_10663

What do we do with this unreliability? Backfill releases for existing tags? Put in place some kind of monitoring going forward?

@lukebjerring
Copy link
Contributor Author

Well, abstractly speaking, the API is an HTTP endpoint for finding the manifest for a SHA.

Whether or not there actually was one available is a reliability issue with a separate service (the service for tagging PRs). Please file an issue on the WPT repo so we can discuss there :)

@foolip
Copy link
Member

foolip commented May 5, 2018

Whether or not there actually was one available is a reliability issue with a separate service (the service for tagging PRs). Please file an issue on the WPT repo so we can discuss there :)

Poking you on web-platform-tests/wpt#10572

// ParseSHAParam parses and validates the 'sha' param for the request.
// It returns "latest" by default (and in error cases).
// ParseSHAParam parses and validates the 'sha' param for the request,
// cropping it to 10 chars. It returns "latest" by default. (and in error cases).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh... there was already a runParam[:10] before and this introduces a ParseSHAParamFull. Sorry, didn't finish reading before reacting :/

}
}

func getManifestForSHA(ctx context.Context, sha string) (manifest []byte, err error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, thanks!

},
releasesJSON["assets"].([]object)[0],
}
client.Responses[gitHubReleaseURL("merge_pr_123")] = unsafeMarshal(releasesJSON)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wouldn't affect the test, but checked that web-platform-tests/wpt#123 was merged very long ago so that we won't be creating a real tag called merge_pr_123 :)

@lukebjerring lukebjerring merged commit 5608442 into master May 7, 2018
@lukebjerring lukebjerring deleted the manifest-api branch May 7, 2018 12:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants