Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error needLargeMem: Out of memory - request size 65568 bytes, errno: 12 #351

Open
corneliusroemer opened this issue Sep 18, 2023 · 13 comments

Comments

@AngieHinrichs
Copy link
Contributor

Thanks for reporting, I'll take a look. It might take a while to fix (especially on the main site since there is a one to four week release cycle delay from the test site). In the meantime, if an option could be added to CoV-Spectrum to randomly downsample, to send UShER no more than 500 sequences (or 400 to be safe) that would help avoid the problem while still giving a good lineage overview (@chaoran-chen ?).

@chaoran-chen
Copy link

Sure, I reduced it to 400. Please let me know if I should increase it again.

@AngieHinrichs
Copy link
Contributor

Wonderful, thanks @chaoran-chen, and as always, so fast! @corneliusroemer, does the query work better for you now?

@corneliusroemer
Copy link
Author

Wonderful, thanks @chaoran-chen, and as always, so fast! @corneliusroemer, does the query work better for you now?

This query from above still pulls in 1000 😜 I managed to get it to work occasionally with 1000, I think, but better not overload your server :)

@corneliusroemer
Copy link
Author

I just reran with limit=400 and now I got the following error (I actually remember seeing that Cannot allocate memory, can't fork before, yesterday and today):
image

@AngieHinrichs
Copy link
Contributor

It's possible that our server is getting a little overloaded. I'll look into it.

@chaoran-chen
Copy link

@corneliusroemer, but the cov-spectrum website now generates links with limit=400, right?

@corneliusroemer
Copy link
Author

corneliusroemer commented Sep 18, 2023

Yes it does @chaoran-chen, but I still get the error needLargeMem: Out of memory - request size 65568 bytes, errno: 12

@AngieHinrichs
Copy link
Contributor

About a week ago we had to impose some stricter limits on the total amount of memory used by all threads of the apache web server, because sometimes too many high-memory requests were hitting us at once and crashing the machine. That may be happening here. I just watched top while trying Cornelius's request and while the hgPhyloPlace process got up to ~15GB, there was a Genome Browser process that got as high as 32GB! I'm tracing through the logs to see if I can figure out what kind of usage makes a Genome Browser process so big (and relatively slow).

[Also I could be a lot smarter about how I'm handling metadata, for SARS-CoV-2 it's enormous and I really don't need to be reading it all in. I should just read in an index, maybe try sqlite?]

@corneliusroemer
Copy link
Author

The failures keep happening stochastically even with covSpectrum only exporting 400 sequences now.

Seems like the overall memory is sometimes tight as when it happens, it happens to a lot of requests (I sometimes send 4 in parallel).

@AngieHinrichs
Copy link
Contributor

Sorry but with our new restrictions on total memory use, sending four requests in parallel might be a bit much... maybe back off to 2? If you need to run lots of these, maybe I can set you up with equivalent matUtils extract commands that you can run locally on full tree files?

@corneliusroemer
Copy link
Author

Sorry but with our new restrictions on total memory use, sending four requests in parallel might be a bit much... maybe back off to 2?

@AngieHinrichs If load is an issue then I can absolutely change my usage, though it comes at the cost of having to modify my established workflow. It appears that I am indeed single-handely crashing (or rather thrashing) Usher when firing off some 5 requests in short succession. I definitely don't want to DDOS Usher, so yes, I shouldn't do that anymore now that I'm aware.

If you're interested in looking to work around this here are some things that might be worth considering:

  • it might be the number of requests passed to Usher that cause thrashing rather the number of requests, whether large or small (just anecdotal evidence/gut feeling)
  • how have you implemented the new memory limit? Is this effectively rate limiting by rejecting requests when more than X jobs are running before you get into memory issues or do you rate limit only once things start getting slow?
  • Unless I'm the only one getting those OOM messages (logs will tell whether it's just my IP ;) ) it might be nice to wrap them so that users know that it's not a bug but that Usher is under heavy load ("Usher is currently very busy, you might want to look at matUtils extract if you want to run this yourself...")

I'm absolutely willing to figure out how to use Usher locally. I should have already done so long ago - the reason I haven't is that until now the web server was good enough. As you know, I've used Taxonium with the trees, but stopped using that when Taxonium's relative lack of features made it less effective in my view than using Nextstrain/Auspice trees via web usher (cc @theosanderson in case you'd want to have some power user feedback on things that Taxonium is missing to make it as good or better than auspice not only on very large trees but also on tree sizes that auspice can handle).

I would love to have a look at using matUtils extract with you to see whether it might be easy to get up a local equivalent of what the web server does - that could be very useful to others as well, maybe as a tutorial on how to get started with matUtils.

@theosanderson
Copy link
Contributor

@corneliusroemer yes if you could you remind me what the highest priority Taxonium feature request(s) would be for you that would be helpful. Mutation text without hovering? (Feel free to open an issue in Taxonium repo - one issue that lists everything you want to mention would be fine).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants