Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bionic inventory CKAN requests hang #1068

Closed
adborden opened this issue Oct 25, 2019 · 7 comments
Closed

Bionic inventory CKAN requests hang #1068

adborden opened this issue Oct 25, 2019 · 7 comments
Assignees
Labels
bug Software defect or bug component/inventory Inventory playbooks/roles

Comments

@adborden
Copy link
Contributor

adborden commented Oct 25, 2019

After putting Bionic into production for Inventory, some requests hang indefinitely. This manifests as uptrends alerts because some requests will timeout.

What we know so far...

  • The behavior is somewhat intermittent. Sometimes it works, sometimes not.
  • This did not seem to be an issue for staging, only production.
  • This looks similar to what @adborden was seeing on Catalog when testing bionic.
  • strace does not show a clear culprit.
  • The hang seems to happen within CKAN's plugin initialization phase.
  • We've verified connections to database (inventory and datastore), solr, and s3.

How to reproduce

  1. sudo -u www-data /usr/lib/ckan/bin/paster --plugin=ckan serve /etc/ckan/production.ini
  2. curl -v -L -k localhost:5000/api/action/status_show

Expected behavior

Request returns 200, JSON response of the server status

Actual behavior

Request hangs, timeouts after ~5 minutes.

@adborden
Copy link
Contributor Author

We could try disabling all the plugins, and re-enable them one by one.

@adborden adborden added bug Software defect or bug component/inventory Inventory playbooks/roles labels Oct 25, 2019
@adborden adborden self-assigned this Oct 25, 2019
@mogul
Copy link
Contributor

mogul commented Oct 25, 2019

I think what we're seeing is happening at the end of the plugin-initialization process. And again: Why do we only see this behavior on these two particular hosts, when the same plugins work on the staging hosts?

@adborden
Copy link
Contributor Author

adborden commented Dec 6, 2019

I think I'm seeing this behavior on staging Trusty now.

@adborden
Copy link
Contributor Author

@FuhuXia has spent some time on this and hasn't been able to reproduce it in any environment. We are going to go ahead with the deploy, checking along the way in case it comes back.

@mogul mogul added this to the Sprint 20200220 milestone Feb 20, 2020
@mogul mogul closed this as completed Feb 20, 2020
@adborden
Copy link
Contributor Author

Seeing this issue on production again.

@adborden adborden reopened this Mar 24, 2020
adborden added a commit that referenced this issue Mar 24, 2020
This reverts commit 7e2d211.

We've ran into #1068 again. Rolling
back.
@FuhuXia
Copy link
Member

FuhuXia commented Mar 24, 2020

When one Bionic hangs, it also brings down other Trusty instances. Trusty instance will take forever to load /dataset and eventually gives 500 error. The log shows it is related to postgres QueuePool limit.

Screenshot 2020-03-24 17 06 32

@adborden
Copy link
Contributor Author

Closing as a duplicate of #1375, we're at least talking about these as if they're the same issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Software defect or bug component/inventory Inventory playbooks/roles
Projects
None yet
Development

No branches or pull requests

3 participants