-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sporadic error 502 & 503 #414
Comments
It happened to me too while testing some new feature (yesterday it was quite frequently). After a couple of seconds the request succeeded. It feels like the back-end is unresponsive for a short period of time for whatever reason. @anlausch do you have an idea what causes this, and can you do something about it? |
I cannot reproduce this now. Therefore, I would suggest you penetrate the system a bit more and maybe you'll encounter certain patterns, e.g. when exactly does it happen, i.e. after which operations does the system get unstable? In the logs I can see that at some point on friday mongo was not reachable (weird), but I guess this is an exception and secondly, that for a while there where heap space issues. These I cannot trace right now, which is why I would like to ask you to help with the inspection a bit, while I am unavailable. |
Right now (13:29) I can't upload anything because of the 503 service unavailable error. |
I started a browse search (14:02) and now I get a 502 proxy error. |
At the moment (4 PM Tuesday) 6GB of the 8GB available RAM on the server are in use. Could it be just memory problems with running more and more background jobs for four instances of our system? |
17:00, Browsing in Production: Should I pass the question with the memory problems on to our IT? |
10:37h: 500 Internal Server Error while Browsing |
11:10h: Still the same error |
13:12 h, from a different computer: it works now. |
We are now using a different session store and I contacted @stweil for help. Lets see whether this improves the situation. |
@lgalke What you are assuming is in line with what I am currently thinking about the problem. |
IT reconfigured the server from 8 GiB RAM + 8 GiB swap = 16 GiB memory to 24 GiB RAM + 8 GiB swap = 32 GiB memory. I expect that this will improve responsiveness because of the larger disk cache, but not solve the problem. |
I just had the 503 service unavailable error again (I uploaded a chapter to the demo version). It immediately worked after that, but there seems to still be an issue. |
Extract from
Extract from
Both messages occur 2 times within milliseconds. Port 3102 is handled by this process:
The process refuses connections. @anlausch, the reason for that must be searched in the code. Is there corresponding information in |
Older error log files also contain lots of refused connections for the ports 3000, 3101 and 3102. |
I also see burst of refused connections. In the following log extract the
Are there applications which open unnecessarily many connections? The current code uses a |
The front-end triggers as few requests as possible. For instance, when saving bounding boxes, we check whether the coordinates have actually changed and only issue save requests for these ones |
Status code 500 (internal server error) is related:
Here the JavaScript code accepted the connection, but terminated it before a full HTTP answer was sent. This can be caused by software bugs or by unexpected program exceptions like out of memory. |
Even our oldest server log files already show the error pattern, so it started before Friday 2018-10-27:
|
Is there any news on this? |
I am currently working on reproducing the issues locally. No success on this so far. |
If it helps, during the webmeeting I got several 502 and 503 errors. |
I prepared a modified version of our development server now, which should help to pinpoint the issue. But, I will try to make this instance fail often, so don't be confused. |
It looks like the backend is down. |
There are literally no applications running anymore. I can restart them, but it is strange that they even do not appear anymore in the deployment manager. This was never the case so far. Any idea what happened? |
@stweil Maybe it is best to restart the whole server from scratch? |
I restarted mongodb now. It had failed on Saturday because of the disk full condition:
|
All apps should be online again. |
I just checked real quick and either the system is down again or still. I didn't get a chance to check yesterday so I can't say for sure. |
The disk is full again, so mongodb terminated... :-( |
I'm now moving the log file to another server and will restart mongodb then. |
After removing the three largest log files there are now 37 GB available again. Apache2 and mongodb are up. @anlausch, please restart your services and test. |
Yesterday, it took less than an hour to write the additional freed storage (~1GB) full (rough estimate based on the log file). In this way this will produce huge data files very easily and break the system often. We should discuss what is needed for the logging and how we can do that. |
I've now updated the logging mechanism as well as the log level. This should do the job. If not, we can make further adaptions related to writing on disk. Also, I am still working on the memory leak, which seems to be related to a module we are using for parsing marc. But this needs to be further investigated. |
@anlausch, your modifications seem to work: |
I just deployed an attempt to fix the leak to dev. Please test extensively. |
I'm not sure if it's the same: The 500 error while adapting bounding boxes persists: [scan-inspector][Error] Error while deleting Entry: Http failure response for https://locdb.bib.uni-mannheim.de/locdb-dev/bibliographicEntries/5bc842f2352db22178ae8366: 500 Internal Server Erro |
Den kannte ich noch nicht: |
Kam gleich nochmal vor, beide Male war der Typ der Ressource Proceedings Article. Object { msg: "REFERENCE TARGET SELECTED", title: "An architecture for the aggregation and analysis of scholarly usage data", entry_id: "5bc7238f4fb7d00dfefd9fcf", current_selected_ids: (2) […] } |
The 500 error is not related to this issue. The 400 error just tells, that there is no target for the commit given. In order to know by what these issues are caused by, we need to check, but it is at least not related to the stability problems we had. :) |
Then I suggest to move @kleinann's report to a new issue. |
Right now I have this error again: EDIT: okay, now it works again |
Here is the corresponding error message on the server:
|
I have the 502 error again. I'm trying to upload a pdf into the production:
|
proxy errors sometimes were connected with issues in the interface to the OCR engine. Did the API change, as you have recently updated the OCR engine @rtahseen ? |
I tried a different pdf and curiously enough that works. Maybe it is the pdfs fault?! |
@lgalke Updating OCR engine can not cause this error. Earlier, such errors were caused by synchronous calling mechanism. But such errors are long gone for Automatic Reference Extractor ever since it moved to total Asynchronous calling mechanism. By looking at the error message it looks like that it is coming from this url https://locdb.bib.uni-mannheim.de/locdb/saveResource |
Error 503 right now: |
It's up again, @stweil rebooted the system. |
Our Tübinger colleages report that their system is down. They have the Error 503 as well. Can you check this out? |
A reboot didn't help because the system always shuts down again. |
Tübingen system is up now |
All systems (prod, dev, demo, tue) are down again. EDIT: it works again |
Since friday we frequently get the 502 proxy error or the 503 service unavailable error. Is this because you are working on it or is it something we should be concerned about?
The text was updated successfully, but these errors were encountered: