Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrations failing to apply with opaque error message #2953

Closed
brandonmp opened this issue Sep 28, 2019 · 7 comments
Closed

Migrations failing to apply with opaque error message #2953

brandonmp opened this issue Sep 28, 2019 · 7 comments
Labels
c/cli Related to CLI

Comments

@brandonmp
Copy link

Attempting to apply migrations via the CLI to our database, but we hit this error:

FATA[0031] apply failed: invalid character '<' looking for beginning of value

We have several identical hasura/postgres deployments, and migrations apply without incident on each deployment except for this one (this one's a new deployment with no migrations currently applied, fwiw).

I deleted the latter half of the migrations directory & applied the remaining migrations successfully, so my assumption is there's some problem within one of my migration files.

I also tried resetting the migration via the steps here. This actually worked (!), BUT we have a bunch of data migrations (eg populating enum tables) in our migrations, and the steps outlined in the article don't seem to capture those migrations.

hasura is version beta.4 (same as CLI), attempted all this on both windows and bash-for-windows.

@0x777 0x777 added the c/cli Related to CLI label Sep 28, 2019
@0x777
Copy link
Member

0x777 commented Sep 28, 2019

@brandonmp Is there a gateway in front of graphql-engine? It is probably returning a 504 with a html page and the cli is trying to parse it as JSON and hence the error. @shahidhk Is it possible to have a special case for dealing with gateway error codes?

@jerryc8
Copy link

jerryc8 commented Sep 28, 2019

Hi @shahidhk thanks for your comment. I did a detailed investigation of this and here is more info for you. Please note that we have about 200 migration files in the migrations folder.

Bug exhibits following behaviour:

  • The bug seems to be a timing issue
  • When applying approx >=26 migration files (ie, --up 26), this error occurs. (Heroku PostgreSQL and Hasura)
  • This number 26 varies based on network conditions and database instance. When @brandonmp tried it with a different cloud provider, even a 100 files did not trigger the bug, but 200 files did trigger the bug
  • When this error occurs, immediately running migrate status after the migrate app will show a similar error
  • However, after a minute or so, running migrate status will show that all migrations are fully completed
  • Bug does not happen for local Docker-contained instances, probably because there is no network latency for the timing bug to exhibit itself (both psql and hasura are inside docker-compose)

Platforms and Versions

  • Bug is reproducible with Hasura command line on all platforms: Windows, MacOS, and Linux
  • Bug is reproducible with Hasura CLI tool at both v1.0.0-beta.4 and v1.0.0-beta.6
  • Bug is reproducible with Hasura server at both v1.0.0-beta.4 and v1.0.0-beta.6

Workaround

This script did work around the issue and there is no error when run:

$ for i in {1..20}; do date; time ./hasura-linux migrate apply --up 10; echo $?; sleep 60; done

@shahidhk
Copy link
Member

Bug does not happen for local Docker-contained instances, probably because there is no network latency for the timing bug to exhibit itself (both psql and hasura are inside docker-compose)

It is indeed a gateway throwing a timeout or the connecting getting terminated since it takes more time to apply the migrations.

What is the cloud provider and how are you running the database and Hasura? Are both in the same region?

The workaround script further validates that the problem is with gateway not handling long http connections.

@shahidhk
Copy link
Member

We'll add better error handling on the CLI #2954

@brandonmp
Copy link
Author

brandonmp commented Sep 28, 2019

Thanks for quick responses everyone

What is the cloud provider and how are you running the database and Hasura? Are both in the same region?

GCP, both hasura and postgres (cloudsql maybe but not sure, can verify w/ our infra team if helpful) are in a kubernetes cluster with nginx in front of them.

@shahidhk
Copy link
Member

What is the timeout on nginx? Can you try increasing it? How is nginx exposed? Using a loadbalancer or another Ingress?

@marionschleifer marionschleifer assigned rikinsk and unassigned shahidhk and rikinsk Nov 29, 2019
@marionschleifer
Copy link
Contributor

Closing this issue, as the actionable is tracked at: #2954. If you'd like to add anything, feel free to re-open 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/cli Related to CLI
Projects
None yet
Development

No branches or pull requests

6 participants