Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support multiple shas #88

Merged
merged 24 commits into from
Oct 21, 2020
Merged

support multiple shas #88

merged 24 commits into from
Oct 21, 2020

Conversation

jmelis
Copy link
Contributor

@jmelis jmelis commented Oct 15, 2020

Part of https://issues.redhat.com/browse/APPSRE-2608

This is a major PR that introduces a major feature and modifies the internal mechanics of the server.

For consistent access, qontract-server clients can connect to it via a specific <sha> using this url: /graphqlsha/<sha>. If the data is reloaded the sha will also be renewed, and the server will start returning 409 indicating that there is a conflict and that it needs to fetch a new sha.

This has a negative impact on the reconciliation times for the integrations that query this server. Every time the data is reloaded, the integration is interrupted and has to start afresh. When the data is changed very frequently, the state may never be reconciled.

The goal of this PR is to add support for multiple shas. Each time the data is reloaded, the new bundle is exposed on its new <sha> (with the same well-known url /graphqlsha/<sha> thus maintaining backwards compatibility), but previous shas should continue to work. Note that /graphql will consistently point at the latest bundle, which is useful for casual access to the server, not for integrations.

Cache Invalidation

In order to guarantee memory efficiency, the shas will expire after a certain amount of time:

  • When the data is loaded for the first time, the expiration time is set to 20 minutes in the future (can be overriden by the BUNDLE_SHA_TTL environment variable).
  • Each time a sha is queried specifically the expiration is refreshed to the BUNDLE_SHA_TTL in the future again. This means that shas can be kept available forever by querying them before the BUNDLE_SHA_TTL has passed.
  • The latest sha, which is the one pointed at by /graphql will never expire.
  • Shas are only expired when /reload is queried.
  • If /reload is called and there is no new data available, then no shas will be expired.

Method changes

  • GET /sha256: (existed previously) Now it specifically points to the latest bundle. No backwards compatibility conflicts.
  • GET /git-commit: (existed previously) Now it specifically points to the latest bundle. No backwards compatibility conflicts.
  • GET /git-commit/:sha: (new method) It is possible to obtain now the git-commit for a specific sha.
  • GET /cache: (new method) Prints the cache state.

Metrics

The metrics have been refactored to a separate file metrics.ts.

There are also new metrics introduced:

# HELP qontract_server_router_stack Number of layers in the router stack
# TYPE qontract_server_router_stack gauge

# HELP qontract_server_cache_bundle Number of shas cached by the application in the bundle object
# TYPE qontract_server_cache_bundle gauge

# HELP qontract_server_cache_bundle_cache Number of shas cached by the application in the bundleCache object
# TYPE qontract_server_cache_bundle_cache gauge

These new metrics are useful in order to ensure that there are no memory leaks. Note that qontract_server_cache_bundle and qontract_server_cache_bundle_cache should always coincide in number.

Breaking changes

There are no breaking changes. This PR preserves backwards compatibility. However, there are two minor changes that are worth mentioning.

  • The server will no longer send 409 on any events. 409 indicates conflict, which is something that no longer applies as the shas will continue to be available. If the client queries a non-existing sha, or an expired one, it will receive 404.

  • GET /graphql which was undocumented and just a corner case of the poorly defined middleware that existed previously will no longer redirect to POST /graphql. This was never the intention, so it has been removed. The tests however have been rewritten to properly send POST data, instead of performing GET queries as they were incorrectly doing before.

Testing

The functionality of the previously existing tests has remained untouched (except for renaming the clusters directory to schemas which is the proper name).

A new set of tests have been included in this PR which test the multiple shas functionality, and the bundle sha expiration and invalidation:

  multishas
    ✓ serves a basic graphql query (38ms)
    ✓ serves a basic graphql query using GET
Skipping reload, data already loaded
    ✓ reloads and works
reloaded
    ✓ loads new data and works
    ✓ access via sha1
    ✓ access via sha2
Removing expired bundle: 038f5149906c198069b08285bfba5a955531780793055a64a2bebc4920385279
reloaded
    ✓ removes expired bundle
    ✓ access via sha2
    ✓ access via sha3

Things to look out for

  • Memory leaks. An important part of this PR resides in the logic that removes the expired shas. There is a new set of tests that verify that data is properly purged when the bundle shas are removed. However, this needs to be carefully reviewed.

  • Every time that a new bundle is added it is mounted into the express server with this directive: app.use(serverMiddleware);. This is an express framework method call that pushes the middleware to the internal router stack. The challenge is that there is no documented / supported way to remove a middleware instance, which is precisely what we want to do when bundles expire. What we have done in this case is to access the private router stack (app._router.stack) and splice it in order to remove the relevant middleware. This is somewhat dangerous and may break if the express framework is upgraded (regardless if it's a major or minor update).

  • The ApolloServer package has been upgraded. This is because the schema hot-reload internals have changed. This specifically affects the GraphQL inline fragments. However, this was a minor version upgrade, and it didn't affect the tests, so it is of minor importance.

EDIT: Reimplemented GET /graphql -> POST /graphql

@jmelis jmelis marked this pull request as ready for review October 16, 2020 17:31
Copy link
Contributor

@maorfr maorfr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from just reading the description i understand that you thought of every scenario i could think of.
this is amazing work.

LGTM

@maorfr
Copy link
Contributor

maorfr commented Oct 20, 2020

qontract-reconcile PR: app-sre/qontract-reconcile#1135

Copy link
Contributor

@rporres rporres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an impressive job. A few comments and suggestions here and there to the best of my knowledge (which is extremely limited in the js world)

One thing I don't particularly like is breaking changes... Especially since we're doing GETs with query params, e.g. https://github.com/openshift/aws-account-operator/pull/493/files, and it makes sense that we're able to do a select with a query string via GET (on top of the POST support, I'd do both)

As a request, I'd add part of this great and very helpful description in the documentation of the repository :)

src/metrics.ts Outdated Show resolved Hide resolved
src/server.ts Show resolved Hide resolved
src/metrics.ts Outdated Show resolved Hide resolved
src/metrics.ts Outdated Show resolved Hide resolved
src/metrics.ts Outdated Show resolved Hide resolved
src/metrics.ts Outdated Show resolved Hide resolved
src/server.ts Outdated Show resolved Hide resolved
src/server.ts Show resolved Hide resolved
src/server.ts Show resolved Hide resolved
test/multishas/multishas.test.ts Show resolved Hide resolved
@jmelis
Copy link
Contributor Author

jmelis commented Oct 20, 2020

One thing I don't particularly like is breaking changes... Especially since we're doing GETs with query params, e.g. https://github.com/openshift/aws-account-operator/pull/493/files, and it makes sense that we're able to do a select with a query string via GET (on top of the POST support, I'd do both)

Reimplemented and updated PR description.

src/metrics.ts Outdated Show resolved Hide resolved
@rporres
Copy link
Contributor

rporres commented Oct 20, 2020

And don't hate me too much, but please convert the relevant bits of the description of this PR into proper documentation.

@jmelis
Copy link
Contributor Author

jmelis commented Oct 21, 2020

@rporres I have updated the README.md, it has been neglected so thanks for pointing it out.

Copy link
Contributor

@rporres rporres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants