Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for blocked and parallel reprojection in reproject_interp #214

Merged
merged 4 commits into from
Sep 6, 2022

Conversation

AlistairSymonds
Copy link
Contributor

@AlistairSymonds AlistairSymonds commented Jan 16, 2020

Performing reprojection of any non-healpix type by breaking up the out space into blocks and iterating over each block to perform reprojection into that specific block of area. This means a large output space can be used whilst only having to load one block into memory at a time.

Additionally these output blocks can be processed in parallel to achieve a speed up of any function.

As mentioned there is still some debug code in there which needs to be cleaned up (commented print statement, imports used for profiling), but the functionality is there.

@astrofrog
Copy link
Member

Thanks for working on this! I've been taking a look at it and thinking about what would make sense from the point of view of the user API - in particular I think that we could simplify things a little if we don't expect this function to be called by the user directly, but having the user set e.g. chunk_size=(100,100) (and optionally parallel=...) when calling reproject_interp and reproject_exact, and then in the high_level.py files, once the input has all been parsed and sanitized, e.g here:

return _reproject_full(array_in, wcs_in, wcs_out, shape_out=shape_out, order=order,

we would check if chunk_size is set, and if so, call your helper function to call the core function over chunks. This would mean not needing to do any parsing/validation in your function and also a simple API for users. Would you have time to explore this? If not, I'm happy to take over the PR and add commits to it, so just let me know what is best for you :)

AlistairSymonds added a commit to AlistairSymonds/reproject that referenced this pull request Jan 18, 2020
Scaffold seamless wrapper insertion in reproj_interpolate as per pull request astropy#214
@AlistairSymonds
Copy link
Contributor Author

AlistairSymonds commented Jan 18, 2020

I'm happy to look at it, just making sure I understand correctly, you mean doing something like this:

#if either of these are not default, it means a blocked method must be used

if block_size is not None or parallel is not False:

        #if parallel is set but block size isn't just divide output dimensions by number of processes to get block size?

        print("Hi, I'm handing the blocked case!")

    else:

        return _reproject_full(array_in, wcs_in, wcs_out, shape_out=shape_out, order=order,
                           array_out=output_array, return_footprint=return_footprint)

https://github.com/AlistairSymonds/reproject/blob/b7ab13c800b71526e3b38af8e118fcdf3edd6798/reproject/interpolation/high_level.py#L95-L101

@astrofrog
Copy link
Member

Yes exactly!

@AlistairSymonds
Copy link
Contributor Author

AlistairSymonds commented Jan 19, 2020

Okay that let me clean up the blocked_reproject function but there are still some issues where the interfaces for core functions don't quote match up, for instance both adaptive and interp have the order argument but the exact function does not.

I've just done it for reproject interp so far, and it could be done in the same way for adaptive and exact, but then there would still be issue of functions not quite matching up. You're probably better equipped than I am to suggest a solution, but I'm happy to implement it.

@AlistairSymonds
Copy link
Contributor Author

@astrofrog any thoughts? It's cleaned up the issue a bit but not entirely (if the functions took **kwargs maybe it could be done nicer?)

@astrofrog
Copy link
Member

@AlistairSymonds - I'll look at this shortly, I'm very sorry for the delay!

@codecov
Copy link

codecov bot commented Jan 20, 2021

Codecov Report

Merging #214 (85c9ffb) into main (89178da) will increase coverage by 0.35%.
The diff coverage is 98.57%.

@@            Coverage Diff             @@
##             main     #214      +/-   ##
==========================================
+ Coverage   94.34%   94.69%   +0.35%     
==========================================
  Files          23       23              
  Lines         725      792      +67     
==========================================
+ Hits          684      750      +66     
- Misses         41       42       +1     
Impacted Files Coverage Δ
reproject/utils.py 87.75% <98.30%> (+6.85%) ⬆️
reproject/interpolation/high_level.py 100.00% <100.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@AlistairSymonds
Copy link
Contributor Author

Hey @astrofrog I've found some free time up my sleeve and a need to reproject a large number of images. So I've gone through and cleaned up some of the issues left dangling so all the defaults tests are passing.

There is two things remaining I think though,

  • obviously some new tests are needed for the parallel functionality. I was just going to take one of the examples from say the tutorial and compare the output of blocked/parallel against existing single thread whole image impl. Any comments you want to add on that plan would be appreciated

  • The round trip tests are skipped unless sunpy is installed, the CI doesn't run these tests, are they expected to be ran or deprecated/purposefully not a priority?

I've got a few other ideas in the back of my mind that should really help speed up reprojecting large areas of the sky at high resolutions like culling empty tiles, but its the kind of thing that can probably wait until after this PR :)

@AlistairSymonds
Copy link
Contributor Author

The round trip tests are skipped unless sunpy is installed, the CI doesn't run these tests, are they expected to be ran or deprecated/purposefully not a priority?

Apologies seems I was misunderstanding the AzurePipe reports, and they indeed ran and passed when I properly update my local tree to merge in latest from this repo.

I also only just found the astropy top level contributing guide, its highly likely it's got plenty of suggested changes too. (might be worth having a contributing file in this repo that just points there for dummies like me who aren't perfectly across astropy?)

@pllim
Copy link
Member

pllim commented Jan 12, 2022

The CI is a little busted right now. It is unfortunate that this PR has been open for so long. We apologize for any inconvenience caused. Hopefully @astrofrog will come back to reviewing this soon...

Copy link
Contributor

@keflavich keflavich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have two really minor comments but this looks great to me. I'm testing it out on some cubes now.

# if parallel is set but block size isn't, we'll choose
# block size so each thread gets one block each
if parallel is not False and block_size is None:
block_size = [dim // os.cpu_count() for dim in shape_out]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code block doesn't match the comment above, right? If we have 4 CPUs, say, and a 2D image, block_size will be 1/16th the image size, so each CPU will get 4 blocks. This isn't necessarily a bad thing, but it's also not obviously the right default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes you're dead right, and its probably worth doing big long strips that match the fits memory layout anyway.

proc_pool = None
blocks_futures = []

if parallel or type(parallel) is int:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want and here? Otherwise parallel=0 will try to work with 0 workers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might need tightening up, I was imagining parellel=True as being let reproject auto determine threading, but as you say zero would lead to issues.

@keflavich
Copy link
Contributor

One more aside - it would be nice to add some sort of progressbar support, but that can be a different Issue.

@AlistairSymonds
Copy link
Contributor Author

Also just quickly while there's more people looking at this, I'm gonna go over @keflavich's comments and make some updates if no one says otherwise.

Generally if anyone has feedback on more higher level python/astropy ways of doing things, I'll take it, I'm someone who's written a fair bit of python at various points but main wheel house is verilog and HW design - so I've probably missed something obvious.

Copy link
Member

@astrofrog astrofrog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AlistairSymonds - this is looking good so far! Please do go ahead and rebase (to hopefully fix the CI) and implement @keflavich's comments.

The only comment I have for now is a general one, which is that I think it would be good to change the terminology from 'block' to 'chunk' to match dask terminology - as I think we might want to add support for dask anyway in reproject in future (for input/output arrays) and it would be confusing to have two different terms.

I have more time available to work now so I will be more responsive here now and would like to make sure we can include this in the next release! 😄

@astrofrog
Copy link
Member

Actually thinking about this a bit more, maybe keep it as 'block' for now and I can try and add full dask support as a follow-up PR and I will see if it actually makes sense to use one term over the other? (Dask itself uses both terms)

@AlistairSymonds
Copy link
Contributor Author

Yeah I've got no strong attachment to any term, there's lots of options that are as equally overloaded as each other it seems - tiles, chunks, block, bins.

And frick I've completely beans'd up the rebase, I still had a master branch laying about in my fork instead of main and its all gone awry - gonna fix that up then address the comments for realsies this time instead of just talking about it.

AlistairSymonds added a commit to AlistairSymonds/reproject that referenced this pull request Aug 25, 2022
Scaffold seamless wrapper insertion in reproj_interpolate as per pull request astropy#214
@AlistairSymonds
Copy link
Contributor Author

AlistairSymonds commented Aug 25, 2022

Okay as nice as it would be to have the original 3yr old pull request be accepted the git history is turning into a nightmare - when I have time on the weekend I'm going to make a new devbranch then manually port across. (Is it obvious I've only used perforce professionally? :P )

quick edit: I think I've fixed most of the mess - commit log mess should all still be able to be squash-ed away before merging too right?

AlistairSymonds added a commit to AlistairSymonds/reproject that referenced this pull request Aug 26, 2022
Quick and dirty implementation of blocked reproject_interp - issue astropy#37

Revert to original reproject source

Blocked wrapper that works with non-Healpix reprojection functions

broken out reprojrection from block generation in prep for multiprocessing

Completed functionality of block reproject wrapper for non-healpix methods

Formatting changes to match original source

Added memory usage instrumentation

Fix memory leak from storing futures in multiprocessing option

Fixed process pool args parsing and switched to dicts

Removed test code from blocked helper func Scaffold seamless wrapper insertion in reproj_interpolate as per pull request astropy#214

Remove errorenously added testing script

Integrated blocked reprojection in interpolate function

Removed profiling imports from utils

Formatting fixes

Formatting fixes PEP8 tox edition

Fixes for the blocked to match non-blocked behaviour

Fixes for wcsapi input testcases

Fix WCS slicing axis incorrectly swapped

Add naive tests for blocked and parallel reproj

codestyle fixes for blocked test

Fix issues blocked reprojection with footprint

Style fixes for return fp blocked fixes

Update blocked corner case test to be more useful

Revert "Squashed commit of the following:"

This reverts commit f554ce9.

Revert "Revert "Squashed commit of the following:""

This reverts commit fa384e4.

Manually re-add blocked code to reproj interp

Manually fix up blocked tests

Fix blocked tests to use get_pkg_data function

Fix codestyle issues

Address core comments made by @keflavich
@AlistairSymonds
Copy link
Contributor Author

AlistairSymonds commented Aug 26, 2022

@astrofrog I'd say its 99% clean, the only issue left seems to be from the testdata being used when I run with the oldestdeps config. It seems something to do with (un)pickling the WCS causing a FITSFixedWarning in unpickling when returning from the multiprocessing call of _block() causing pytest to fall apart?

The only thing I've found otherwise is the following two pull requests

astropy/astropy#12844

which came from it being identified as a bug here: astropy/astropy#12834

So indeed the lines causing the warning have been fixed in later astropy versions, and when running the test code outside of we just get warnings but the np assert doesn't fire, eg:

(astro) alistair@alistair-VirtualBox:~/reproject$ cd reproject/interpolation/tests/
(astro) alistair@alistair-VirtualBox:~/reproject/reproject/interpolation/tests$ python
Python 3.8.13 (default, Mar 28 2022, 11:38:47) 
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import test_core
>>> test_core.test_blocked_against_single(True, [100,100])
WARNING: FITSFixedWarning: The WCS transformation has more axes (2) than the image it is associated with (0) [astropy.wcs.wcs]
WARNING: FITSFixedWarning: The WCS transformation has more axes (2) than the image it is associated with (0) [astropy.wcs.wcs]
WARNING: FITSFixedWarning: The WCS transformation has more axes (2) than the image it is associated with (0) [astropy.wcs.wcs]
WARNING: FITSFixedWarning: The WCS transformation has more axes (2) than the image it is associated with (0) [astropy.wcs.wcs]
>>> exit()

Is this something worth working around or just leave as is since a newer version of astropy has fixed it?

@pllim
Copy link
Member

pllim commented Aug 26, 2022

Since it has been fixed in astropy already, you could use warnings module to ignore this warning for the test here. Thanks!

# https://github.com/astropy/astropy/pull/12844
# All the warning code should be removed when old version no longer being used
import warnings
warnings.simplefilter('ignore', category=FITSFixedWarning)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this ignore it for the whole test session? Would a context manager be safer?

Copy link
Contributor Author

@AlistairSymonds AlistairSymonds Aug 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a quick test by just adding the following function to the bottom of the test file and it still failed correctly - so no doesn't seem to affect the whole pytest session.

def test_fitswarning():
    raise FITSFixedWarning

I assume each test gets its own new invocation of the python interepter so it works but couldn't find any conrete reference to that implementation detail or if it should be relied upon.

So I do agree it would be better if a with warnings.catch_warnings() manager was used, since then it could be used just wrap function call with the new changes - I've done this locally and just running some checks

@AlistairSymonds
Copy link
Contributor Author

I think that's everything now? Once this is in I'm also happy to take a look at doing a similar thing for the adaptive and exact methods too - just tackling one problem at a time :D

Copy link
Member

@pllim pllim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\ continuation is not recommended, I don't think.

If not none, a blocked projection will be performed where the output space is
reprojected to one block at a time, this is useful for memory limited scenarios
such as dealing with very large arrays or high resolution output spaces.
parallel : bool or int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This input needs some explanation. Only via code diving do I know what setting to int really mean.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll go over this another time/recomment etc - I was trying to emulate the parallel argument already used in the reproject_exact() function but it's definitely been a while since I originally wrote and looked closely at this.

if parallel is not False and block_size is None:
block_size = shape_out.copy()
# each thread gets an equal sized strip of output area to process
block_size[0] = shape_out[0] // os.cpu_count()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't block size be calculated by the number of actual requested cores?

And parallel=1 should equal parallel=False? I don't see that being handled here.

Copy link
Contributor Author

@AlistairSymonds AlistairSymonds Aug 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Block size is controllable separately to allow for memory usage tuning, eg if you're reprojecting a field 10s of degrees wide to a 0.5"/px scale that just blows up memory usage. This was my original usecase for doing this blocked functionality - everything possible to stop loading the entire input and output arrays into memory at once.

Additonally there's also some existing code in the moasicking reproject_and_coadd() function that does some culling on a full input field granularity - I'm not sure if it's actually robust with all distortions/projections but ideally a similar sort of empty block culling could be used in future here too allowing for more tuning.

One thing I've noticed that I've missed stating is that the block size is in terms of pixels in the output space, will add that.

Happy to change the parallel=1 case too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of me thinks parallel=1 should actually use the parallel infrastructure rather with one worker rather than be the same as False - as it could be useful for debugging?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my original thought too - or when someone wants the maths to happen in another process so they could do GIL shennigans in the main thread or something? Not super attached either way and not a big change to do, I've got tomorrow off of work so can address it along with the other stuff Lim mentioned if needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok - I would suggest leaving the current behavior as-is.

@astrofrog
Copy link
Member

@AlistairSymonds - if you can update the docstrings as @pllim mentioned and avoid the use of the continuation character, I'll go ahead and merge this. Thanks!

@astrofrog astrofrog changed the title Dev reproject block Add support for blocked and parallel reprojection in reproject_interp Sep 6, 2022
Quick and dirty implementation of blocked reproject_interp - issue astropy#37

Revert to original reproject source

Blocked wrapper that works with non-Healpix reprojection functions

broken out reprojrection from block generation in prep for multiprocessing

Completed functionality of block reproject wrapper for non-healpix methods

Formatting changes to match original source

Added memory usage instrumentation

Fix memory leak from storing futures in multiprocessing option

Fixed process pool args parsing and switched to dicts

Removed test code from blocked helper func Scaffold seamless wrapper insertion in reproj_interpolate as per pull request astropy#214

Remove errorenously added testing script

Integrated blocked reprojection in interpolate function

Removed profiling imports from utils

Formatting fixes

Formatting fixes PEP8 tox edition

Fixes for the blocked to match non-blocked behaviour

Fixes for wcsapi input testcases

Fix WCS slicing axis incorrectly swapped

Add naive tests for blocked and parallel reproj

codestyle fixes for blocked test

Fix issues blocked reprojection with footprint

Style fixes for return fp blocked fixes

Update blocked corner case test to be more useful

Revert "Squashed commit of the following:"

This reverts commit f554ce9.

Revert "Revert "Squashed commit of the following:""

This reverts commit fa384e4.

Manually re-add blocked code to reproj interp

Manually fix up blocked tests

Fix blocked tests to use get_pkg_data function

Fix codestyle issues

Address core comments made by @keflavich
@astrofrog
Copy link
Member

I've rebased this and will merge if the CI passes - thanks!

@astrofrog astrofrog merged commit 3f29f13 into astropy:main Sep 6, 2022
@astrofrog
Copy link
Member

@AlistairSymonds - thanks for your work on this and sorry it took so long to merge!

@AlistairSymonds
Copy link
Contributor Author

Cheers! Glad it happened in the end - I'll take a look at doing the same for adaptive and exact methods now a base has been put in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants