Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use windowed read/write in median_smoothing #1674

Conversation

Adrien-LUDWIG
Copy link
Contributor

See the issue description in this forum comment:
https://community.opendronemap.org/t/post-processing-after-odm/16314/16?u=adrien-anton-ludwig

TL;DR:
Median smoothing used windowing to go through the array but read it entirely in RAM. Now the full potential of windowing is exploited to read/write by chunks.

Tests

I tested it on a very small dataset of 18 images on an external HDD.
Here are the results:

>>> median_smoothing(input, output, smoothing_iterations=1000, num_workers=16)
# Before
 [INFO] Completed smoothing to create /datasets/windowed_io/odm_meshing/tmp/mesh_dsm.tif in 0:00:30.055436
# After
 [INFO] Completed smoothing to create /datasets/windowed_io/odm_meshing/tmp/mesh_dsm.tif in 0:00:28.892439

I also used cksum to compare the resulting files, which were exactly the same.


Do not hesitate if you have ideas for further testing. 😉

See the issue description in this forum comment:
https://community.opendronemap.org/t/post-processing-after-odm/16314/16?u=adrien-anton-ludwig

TL;DR:
Median smoothing used windowing to go through the array but read it
entirely in RAM. Now the full potential of windowing is exploited to
read/write by chunks.
Uh oh. Sorry for this.
@Adrien-LUDWIG
Copy link
Contributor Author

Sorry for this HUGE mistake. 😅
Tips to future me: re-reading with a fresh head works way better.

@Adrien-LUDWIG Adrien-LUDWIG marked this pull request as draft July 13, 2023 10:49
@Adrien-LUDWIG
Copy link
Contributor Author

Okay, after further testing this morning I came upon errors when reading a big file:

ERROR 1: ZIPDecode:Decoding error at scanline 256, unknown compression method
ERROR 1: TIFFReadEncodedTile() failed.
ERROR 1: /datasets/whole_hdp_features/odm_meshing/tmp/tiles.tif, band 1: IReadBlock failed at X offset 1, Y offset 9: TIFFReadEncodedTile() failed.
ERROR 1: ZIPDecode:Decoding error at scanline 256, incorrect data check
ERROR 1: TIFFReadEncodedTile() failed.
ERROR 1: /datasets/whole_hdp_features/odm_meshing/tmp/tiles.tif, band 1: IReadBlock failed at X offset 0, Y offset 11: TIFFReadEncodedTile() failed.

The little dataset I used to test yesterday calls window_filter_2d only once. So, I suppose it reads the whole file at once.

I need to explore this further. If you have any idea, I'm all ears.

@pierotofy
Copy link
Member

pierotofy commented Jul 13, 2023

Since you're doing writes from parallel threads, perhaps there needs to be a mutex. I'm not sure if that would affect performance.

This would be a cool improvement.

@Adrien-LUDWIG
Copy link
Contributor Author

Indeed, it works using a single worker. I will look into this.

Thanks!
I was able to read a few hundred windows from this file yesterday and was sure it was with many workers... I thought I was starting to lose my mind. Now, that you point it out, I can see where I got mixed up.

@Adrien-LUDWIG
Copy link
Contributor Author

Adrien-LUDWIG commented Jul 13, 2023

The last commit fixes racing conditions while reading/writing using locks.

I could not test performance for now as I don't have any big project with the temporary meshing files. I guess I could test it on something else since it is not too specific.
Edit: I tried to use an orthophoto but it has no nodatavals which causes an error.

My main concern about performance is that each read overlaps with the previous and next on by a few pixels (half the kernel size).

@originlake
Copy link
Contributor

originlake commented Jul 14, 2023

My main concern about performance is that each read overlaps with the previous and next on by a few pixels (half the kernel size).

The previous way did not alter the original pixel value until the whole process is done. Now each thread will read/write directly, you shouldn't use the input file to save the results during processing (later processing might read altered values hence giving incorrect results), instead, write to a new file and rename it after all processes are done. Parallel reading in read-only mode shouldn't be an issue, but writing should be sequential as there is overlapping.

@Adrien-LUDWIG
Copy link
Contributor Author

Oh you're totally right! I remember thinking about the better way to manage the temporary files from one iteration to another... and the idea of making it in place came to my mind. I felt real smart at the time but now that you point it out it's obviously not a good idea. 😅

Should I keep the input file intact until the output is created ? I feel like this is the better way to go but it should be consistent with the rest of the pipeline, so I would like your opinion.

This implies having 2 temporary files when doing more than one iteration, thus taking up more disk space. But I suppose it's fine since it would be the equivalent of the input size which is far less than the individual tiles or the tiles.tmp.tif file. In my case for example:

  • individual tiles total: 1TB (500 tiles of 2 GB)
  • tiles.tmp.tif: 352 GB
  • tiles.tif (the input here): 7.2 GB

If we go this way, should the 2 temporary files be renamed between the iterations to always have a clear name indicating which corresponds to the latest pass?

Let me know what you think and have a nice weekend! 😉

@originlake
Copy link
Contributor

I think keeping the input file intact is better, it also benefits when the process failed in the middle, guarantee the source data is only changed when the process finishes successfully. How to name the temporary files doesn't matter to me, they are just temporary files to store the data, similar to how we store the data in memory. Just make sure to clean up them at the end.

When using rasterio "r+" open mode, the file is well updated while
opened but completely wrond once saved.
@Adrien-LUDWIG
Copy link
Contributor Author

The last 2 commits avoid reading altered data.

The first commit was my first attempt but it doesn't work. I pushed to have feedback, if possible. I don't understand why but when using rasterio "r+" mode to open the file, its data is completely wrong once saved. It is even bigger than the original file.
I can confirm that the file is well updated while opened, because the second temporary file is valid after many iterations of reading from and writing to it.
What am I missing? Is this not what "r+" is for?

The second commit fixes it but I find the workaround quite ugly. I am opened to suggestions.

@Adrien-LUDWIG
Copy link
Contributor Author

Hello again! 👋

I finally had the time to test the performance impact of this PR.

So, I used a dataset of 120 images with a resolution of 5472x3648.
I kept the default parameters.

I stoped the process at the meshing step and timed the execution time of median_smoothing.
Here are the results for different number of iterations:

Iterations Without windowing With windowing
1 10.107 6.558
10 58.68 56.48
100 537.55 538.51

Results are averaged on 10 runs for 1 iteration and 5 for 10 iterations.

For information, here are the sizes of the input and outputs:

File Size (MB)
tiles.tif 74.25
mesh_dsm_1.tif 61.76
mesh_dsm_10.tif 45.65
mesh_dsm_100.tif 30.75

and the resolution of tiles.tif : 6031*6088

Once again, I use cksum to ensure the outputs were the same for both implementations.


Apart from the elegancy of the workaround described in my last comment, I think the PR is ready to be merged.

Do not hesitate to tell me if you have concerns. 😉

@Adrien-LUDWIG Adrien-LUDWIG marked this pull request as ready for review August 11, 2023 16:56
@pierotofy
Copy link
Member

pierotofy commented Aug 11, 2023

Thanks @Adrien-LUDWIG 👍 I've run some tests on a smaller dataset, runtime is similar as you found out, except in my tests the original code runs about ~10-15% faster (which is what I would expect). This can be merged, but I'm trying to understand the use-case and tradeoff: was the goal to reduce memory usage? Did you have ODM crash (run out of memory) at the median smoothing step during processing?

@Adrien-LUDWIG
Copy link
Contributor Author

Yes, exactly, ODM crashed at the median smoothing step because it tried to allocate to much RAM (352 Go for my dataset) whereas the rest of the processing didn't required more than 120 Go of RAM (well, 16 Go of RAM and ~100 Go of swap).

The previous computations on tiles.tif, using GDAL, had no problem handling it. It crashed only when trying to open it with rasterio.

@pierotofy pierotofy merged commit e9e1805 into OpenDroneMap:master Aug 12, 2023
@Adrien-LUDWIG Adrien-LUDWIG deleted the median_smoothing_memory_optimization branch August 13, 2023 21:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants