-
-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make sure reprojection routines work with large images #37
Comments
Processing arrays in chunks is a common issue ... I wonder if view-as-blocks would work efficiently here or if not we could figure out a utility that can be reused elsewhere. |
@cdeil - that looks like it could indeed help! |
I think we should figure out a solution for this. One easy way would be to implement this in the high level interpolation reprojection routine and produce different WCS objects (with updated CRPIX values) and separate shape_out objects for each chunk, and then call |
Here's a very basic form of chunking: import numpy as np
from astropy.extern import six
from reproject.utils import parse_input_data, parse_output_projection
from reproject.interpolation.core_celestial import _reproject_celestial
from reproject.interpolation.high_level import ORDER
def reproject_interp_chunk_2d(input_data, output_projection, shape_out=None, hdu_in=0,
order='bilinear', blocks=(1000, 1000)):
"""
For a 2D image, reproject in chunks
"""
array_in, wcs_in = parse_input_data(input_data, hdu_in=hdu_in)
wcs_out, shape_out = parse_output_projection(output_projection, shape_out=shape_out)
if isinstance(order, six.string_types):
order = ORDER[order]
# Create output arrays
array = np.zeros(shape_out, dtype=float)
footprint = np.zeros(shape_out, dtype=float)
for imin in range(0, array.shape[0], blocks[0]):
imax = min(imin + blocks[0], array.shape[0])
for jmin in range(0, array.shape[1], blocks[1]):
jmax = min(jmin + blocks[1], array.shape[1])
shape_out_sub = (imax - imin, jmax - jmin)
wcs_out_sub = wcs_out.deepcopy()
wcs_out_sub.wcs.crpix[0] -= jmin
wcs_out_sub.wcs.crpix[1] -= imin
array_sub, footprint_sub = _reproject_celestial(array_in, wcs_in, wcs_out_sub,
shape_out=shape_out_sub, order=order)
array[imin:imax, jmin:jmax] = array_sub
footprint[imin:imax, jmin:jmax] = footprint_sub
return array, footprint
if __name__ == '__main__':
from astropy.io import fits
hdu_in = fits.open('/Users/tom/Data/Images/2MASS_j.fits')[0]
hdu_out = fits.open('/Users/tom/Data/Images/MSX_E.fits')[0]
array, footprint = reproject_interp_chunk_2d(hdu_in, hdu_out.header, blocks=(1000, 1000))
fits.writeto('array.fits', array, overwrite=True)
fits.writeto('footprint.fits', footprint, overwrite=True) @mhardcastle - could you try this out to see if it helps alleviate your issues with the very large image? If this works, we can then implement this as an option in the proper |
Sorry for the delay in checking this. I can confirm that it does work and essentially solves the problem with the single large image. Parallelizing this is very easy with multiprocessing.Pool:
|
@mhardcastle - thanks for confirming this! I'll try and get a PR open with something like this, and maybe automatically chunking things above a certain size |
Great! Just to note that while the multiprocessing code I posted works, it doesn't give a significant speedup -- I guess because of some locking issue. I've seen this sort of thing before. Will see if I can find a workround. But for the moment, the specific problem I was dealing with is unblocked again. |
Just came across this issue when trying to put together a silly large widefield mosaic which even memory mapped arrays failed at halfway through the pix2pix roundtrip. I basically just shimmed a slightly modified I think I would move all the memory mapped arrat and footprint generation code out of (@astrofrog I hope you don't mind this ping, I've noticed this issue is a little old and don't want it to slip by) quick edit: would it be better to create a generic chunking function can wrap all the current reprojection methods? |
Thanks for the ping!
Yes I think that might be the best approach - would you have any time to investigate this? |
Definitely going to give it a shot, I'm just trying to think about how to handle memory mapped arrays since this way memory mapped and multiprocessing support could be added to all the functions.
where kwargs are just passed to the reproject_func given, footprint_out and array_out are used for passing memory mapped arrays in, default optimal block_size would be determined by running a few benchmarks probably and parallel for number of processes in ProcessPoolExecutor or similar. Another edit: its pretty easy to wrap the non-healpix functions in that, but the healpix may need a bit more work to get it done cleanly |
I've done a version for the non-healpix functions in the utils package, first off I'm not sure if this the correct place it makes sense to me It takes the selected function as an argument then unpacks the other arguments as needed, critiques are very welcome as most of python code has been 'research quality' not 'library quality'. Additionally I haven't made an formal tests (partly because the utils package doesn't seem to have an area for them), only quick and dirty stuff with the file I accidentally committed to my branch and the larger mosaicking project I linked earlier. Multiprocessing is also something I've only implemented and tested with regards to correctness, I can confirm it works with memmap numpy arrays and produces the right output but aren't sure the impacts of pickling and unpickling the or any contention issues (I only have a quad core to test with too). |
@astrofrog I'll give you another ping because this is pretty much done functionality wise, I didn't run out of memory when dealing with 100k * 200k output projections and the multiprocessing code scales roughly in line with number of processors. Another idea I've had is rasterising to detect empty blocks, so we don't waste time on reprojecting to those areas, this is especially apparent for me since I'm trying to reproject many small images onto one large output plane, so a lot of time is wasted on empty blocks and pixels in the output.
Do you see any blatant issues with that? I'm not super versed on astropy Dealing with the slightly args used by each of the different reproject functions is the tough part imo, I've got them being passed in as kwargs but then the functions themself don't accept **kwargs, so a a big manual unpacking statement might be unavoidable. Projecting from healpix worked, but I can't think of a good way to project to healpix without the special case unpack depending on which function was was passed in, kind of ruining some of the genericness in the wrapper. Then in terms of tests/docs, should it be in utils still or be move to its own file? (possibly under a reproject_blocked folder) benchmark script: https://github.com/AlistairSymonds/astro-project-patchwork/blob/master/reproject_benchmarking.py Once again, any criticism is more than welcome :) edit: I also just noticed the current code has some dodgy print debug statements still in it, so if you're looking at something in confusion that's probably why. |
@AlistairSymonds - thanks for working on this! I'll try and take a look very shortly. |
@AlistairSymonds - could you open a pull request from your branch to this repository as it will make it easier to review the code and leave comments? Thanks! |
Quick and dirty implementation of blocked reproject_interp - issue astropy#37 Revert to original reproject source Blocked wrapper that works with non-Healpix reprojection functions broken out reprojrection from block generation in prep for multiprocessing Completed functionality of block reproject wrapper for non-healpix methods Formatting changes to match original source Added memory usage instrumentation Fix memory leak from storing futures in multiprocessing option Fixed process pool args parsing and switched to dicts Removed test code from blocked helper func Scaffold seamless wrapper insertion in reproj_interpolate as per pull request astropy#214 Remove errorenously added testing script Integrated blocked reprojection in interpolate function Removed profiling imports from utils Formatting fixes Formatting fixes PEP8 tox edition Fixes for the blocked to match non-blocked behaviour Fixes for wcsapi input testcases Fix WCS slicing axis incorrectly swapped Add naive tests for blocked and parallel reproj codestyle fixes for blocked test Fix issues blocked reprojection with footprint Style fixes for return fp blocked fixes Update blocked corner case test to be more useful Revert "Squashed commit of the following:" This reverts commit f554ce9. Revert "Revert "Squashed commit of the following:"" This reverts commit fa384e4. Manually re-add blocked code to reproj interp Manually fix up blocked tests Fix blocked tests to use get_pkg_data function Fix codestyle issues Address core comments made by @keflavich
Quick and dirty implementation of blocked reproject_interp - issue astropy#37 Revert to original reproject source Blocked wrapper that works with non-Healpix reprojection functions broken out reprojrection from block generation in prep for multiprocessing Completed functionality of block reproject wrapper for non-healpix methods Formatting changes to match original source Added memory usage instrumentation Fix memory leak from storing futures in multiprocessing option Fixed process pool args parsing and switched to dicts Removed test code from blocked helper func Scaffold seamless wrapper insertion in reproj_interpolate as per pull request astropy#214 Remove errorenously added testing script Integrated blocked reprojection in interpolate function Removed profiling imports from utils Formatting fixes Formatting fixes PEP8 tox edition Fixes for the blocked to match non-blocked behaviour Fixes for wcsapi input testcases Fix WCS slicing axis incorrectly swapped Add naive tests for blocked and parallel reproj codestyle fixes for blocked test Fix issues blocked reprojection with footprint Style fixes for return fp blocked fixes Update blocked corner case test to be more useful Revert "Squashed commit of the following:" This reverts commit f554ce9. Revert "Revert "Squashed commit of the following:"" This reverts commit fa384e4. Manually re-add blocked code to reproj interp Manually fix up blocked tests Fix blocked tests to use get_pkg_data function Fix codestyle issues Address core comments made by @keflavich
When reprojecting large images, we may run into memory issues for storing the corners or centers of all pixels, so we should have a way to project chunk by chunk (for all reprojection methods)
The text was updated successfully, but these errors were encountered: