-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regrid hangs for ocean variable sea water salinity for standard 2x2 grid #724
Comments
Could it be this? #430 (comment) |
could be but why is it creeping up here? I need to investigate moar... 🍺 |
The problem occurs if one of the input files is opened by _recipe.py, e.g. to extract the vertical levels to interpolate to, or by _data_finder.py to find the start and end year (in the case of CMIP3 data) and then later in another process by the preprocessor. |
Maybe you could attach the recipe so other people can try to reproduce the issue? I cannot find |
yeah sorry @bouweandela been busy looking at other stuff in the meantime, here it is, I'll start poking around now too, on Jasmin 🍺 |
I know what's the problem - the ESMF regridder is trying to construct the regridder but it's not getting enough memory allocated, and it stays there in limbo (probably waiting for memory to be freed on the node), if you use
|
and I don't blame it - this would mean 37G of mem - absolute mega, but it really should not realize the whole of the data in this manner, after all, the reason why I'm asking for level selection here is to shrink the data |
neither regridding nor vertical interpolation are lazy at the moment: #674 |
ya, so we're a bit in the bogs since this will happen all the time when running with these sort of variables and on nodes with poor memory. BTW I just ran the ESMF regridding no problemo (the one both myself and @omeuriot were unable to run) after I have reduced the time to three years and I selected two levels, so it's not an inherent issue with its functionality - it's just really bad at telling you need more memory 😁 |
here - at the point of realizing the lazy data:
it's trying to move about 25G of lazy data to real, and my node has only 14G of available mem, the unloader should kick me out right away. The only way we can run such variables on Jasmin is on sci3 which is dreadfully slow and hammered by everybody and their dog. If we don't select levels in advance, or regions these sort of recipes are impossible to be run - @ledm do you think we can shrink the data somehow? Or not regrid it? |
OK I managed to run the 2x2 ESMF regridding without level selection (all 75 levels in), it took 14min to run a single year worth of monthly data (12 time points, 330x360 grid) - 99.9% of it was waiting on available memory (ran it on sci5 where you usually get about 14G of avail mem) -> this is pretty lamers 😁 |
I had a much closer look at this and found out what the actual delay is coming from - the esmpy regridder assembles the regrid instances per level into a list - if you have a ton of those levels that thing spends forever, I parallelized the list in |
moved to #775 |
if you dont KeyboardInterrupt it'll just hang forever - I recall @schlunma @mattiarighi and @jvegasbsc have had similar issues? This one's pretty bad since this is a standard ocean recipe
recipe_diagnostic_transect.yml
The text was updated successfully, but these errors were encountered: