Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide support for memory mapped operation #52

Merged
merged 7 commits into from
Feb 4, 2024

Conversation

bvacaliuc
Copy link

Description of Pull Request

This PR implements the code necessary to operate over memory regions and to facilitate streaming operations.

Purpose/Summary of work

As per #51, adjustments to the findTPX3H() API are made to implement the 3 requirements. Test cases are added to verify that use of mmap() and memory spaces are equivalent to the std::vector<char *> of the current implementation. A new CLI program benchmark_mmap.cpp is provided to process both single-thread and tbb use cases in a looping mode that is conducive for streaming processing.

Additional detail of work

Per discussions with@KedoKudo, I'm providing this PR with the benchmark_mmap.cpp as the reference implementation. The canonical CLI program sophiread.cpp could be extended with memory mapped support, but I gratefully appreciate this PR to be considered using the above reference. I do apologize that benchmark_mmap.cpp duplicates some of the functions in the common library and I am willing to offer additional PRs to resolve these, if needed.

The environment variable MAX_BATCH_LEN is used to control the amount of intermediate data structures that are allocated during processing. The test programs are sensitive to this value as the number of events produced can vary. This is because partial events are not yet handled gracefully as batches are iterated upon. This is a known limitation at this time. The test programs have been calibrated with a MAX_BATCH_LEN=100000, as this was known to prevent batching when processing the data set data/suann_socket_background_serval32.tpx3. However, the data set data/HV2700_1500_500_THLrel_274_sophy_chopper_60Hz_4.1mm_aperture_siemen_star_120s_000000.tpx3 which is used in the performance benchmark, does roll-over at this setting ( it appears as if a length of 1000000 may be needed to prevent roll-over there ). This implementation has parsed the MAX_BATCH_LEN from the environment in sophiread/FastSophiread/src/tpx3_fast.cpp. It is not critical that this restriction be imposed, but lack of it will prevent extremely large .tpx3 files from being successfully processed.

Testing instructions

Test using standard build instructions, and operations. To observe the use of mmap(), use FastSophireadBenchmarksCLI.app and provide the mmap string as options. For example:

./FastSophireadBenchmarksCLI.app data/HV2700_1500_500_THLrel_274_sophy_chopper_60Hz_4.1mm_aperture_siemen_star_120s_000000.tpx3 /tmp/foo.h5 tgdc+mmap

A suite of test cases and benchmarks comparing the operation of single/tbb/streaming/mmap can be executed via FastSophiread_benchmark.sh. As demonstrated by:

conda activate sophiread
mkdir sophiread/build && cd sophiread/build
cmake ..
make -j4
make test
./FastSophiread_benchmark.sh

mcpevent2hist-pr-mmap-test-output.txt

@bvacaliuc bvacaliuc marked this pull request as ready for review January 30, 2024 22:29
@KedoKudo KedoKudo self-requested a review January 31, 2024 15:28
@KedoKudo KedoKudo self-assigned this Jan 31, 2024
@KedoKudo KedoKudo added the enhancement New feature or request label Jan 31, 2024
Copy link
Contributor

@KedoKudo KedoKudo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally on m2 mac and everything passes (see terminal output below)

@bvacaliuc could you check if it is necessary to switch from std::vector to tbb::concurrent_vector when collecting output from tbb::thread?

❯ ./FastSophiread_benchmark.sh 
*** baseline output for HV2700_1500_500_THLrel_274_sophy_chopper_60Hz_4.1mm_aperture_siemen_star_120s_000000.tpx3
[2024-01-31 12:40:54.145] [info] File size: 43399744 bytes

raw_data: 42551688/43399744 (98%)
[2024-01-31 12:40:54.351] [info] Number of hits: 5303320
[2024-01-31 12:40:54.351] [info] Number of events: 376746
[2024-01-31 12:40:54.351] [info] bad/total hit ratio: 0.00%
[2024-01-31 12:40:54.351] [info] Reading speed: 2.502038e+08 hits/s
[2024-01-31 12:40:54.351] [info] Batching speed: 3.433459e+08 hits/s
[2024-01-31 12:40:54.351] [info] Single-Thread speed: 3.436284e+07 hits/s
[2024-01-31 12:40:54.351] [info] Aggregate speed: 2.575252e+07 hits/s
[2024-01-31 12:40:54.358] [info] Gathering Events speed: 1.371482e+08 events/s
[2024-01-31 12:40:54.358] [info] Writing Output speed: 9.355500e+07 events/s

*** result/HV2700_1500_500_THLrel_274_sophy_chopper_60Hz_4.1mm_aperture_siemen_star_120s_000000/stream+tbb+tgdc ***
*** result/HV2700_1500_500_THLrel_274_sophy_chopper_60Hz_4.1mm_aperture_siemen_star_120s_000000/stream+verify+tgdc ***
*** result/HV2700_1500_500_THLrel_274_sophy_chopper_60Hz_4.1mm_aperture_siemen_star_120s_000000/mmap+tbb+tgdc ***
*** result/HV2700_1500_500_THLrel_274_sophy_chopper_60Hz_4.1mm_aperture_siemen_star_120s_000000/mmap+verify+tgdc ***

*** inspecting results of .h5 files produced ***
ALL 13 result files matched for every iteration

*** report statistics on recorded benchmarks ***
result/single-thread.csv: (hits/sec)
min: 34,474,080 median: 36,375,605 mean: 36,100,608 max: 36,918,600

result/tbb.csv: (hits/sec)
min: 88,892,390 median: 151,492,300 mean: 142,492,915 max: 158,682,300

# in each iteration, and compares the result of each iteration to check that the same answers are obtained.

EXT=${1:-h5}
# TODO: data/suann_socket_background_serval32.tpx3 produces poor performance statistics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds more like a NOTE rather than TODO.
@suannchong is probably the best one who can find out why this particular raw data is leading to false low performance output.

Comment on lines +38 to +47
# TODO: the Sophiread program produces an .h5 file that slightly differs from app, so cannot be used at the moment
#
# Not comparable: </neutrons/nHits> is of class H5T_FLOAT and </neutrons/nHits> is of class H5T_INTEGER
# Not comparable: </neutrons/tof> has rank 1, dimensions [190208], max dimensions [190208]
# and </neutrons/tof> has rank 1, dimensions [190207], max dimensions [190207]
# ...
# seems like some minor adjustments are needed

##if [ "$EXT" == "h5" ] ; then
## ./Sophiread -i $f -E ${result}/${y}.verify.h5 ;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The H5 output functions are using a templated call to process different dataset and the mapping for data types are broken, and the temp solution is to cast everything to float to avoid data loss.
#44 has more details regarding this issue.

Copy link
Contributor

@KedoKudo KedoKudo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good, thank you very much for adding this critical feature 😄

@KedoKudo KedoKudo merged commit ccfaf28 into ornlneutronimaging:next Feb 4, 2024
1 check passed
@bvacaliuc bvacaliuc deleted the pr/mmap branch February 5, 2024 21:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants