-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide support for memory mapped operation #52
Conversation
…a produces poor performance statistics at the moment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested locally on m2 mac and everything passes (see terminal output below)
@bvacaliuc could you check if it is necessary to switch from std::vector
to tbb::concurrent_vector
when collecting output from tbb::thread
?
❯ ./FastSophiread_benchmark.sh
*** baseline output for HV2700_1500_500_THLrel_274_sophy_chopper_60Hz_4.1mm_aperture_siemen_star_120s_000000.tpx3
[2024-01-31 12:40:54.145] [info] File size: 43399744 bytes
raw_data: 42551688/43399744 (98%)
[2024-01-31 12:40:54.351] [info] Number of hits: 5303320
[2024-01-31 12:40:54.351] [info] Number of events: 376746
[2024-01-31 12:40:54.351] [info] bad/total hit ratio: 0.00%
[2024-01-31 12:40:54.351] [info] Reading speed: 2.502038e+08 hits/s
[2024-01-31 12:40:54.351] [info] Batching speed: 3.433459e+08 hits/s
[2024-01-31 12:40:54.351] [info] Single-Thread speed: 3.436284e+07 hits/s
[2024-01-31 12:40:54.351] [info] Aggregate speed: 2.575252e+07 hits/s
[2024-01-31 12:40:54.358] [info] Gathering Events speed: 1.371482e+08 events/s
[2024-01-31 12:40:54.358] [info] Writing Output speed: 9.355500e+07 events/s
*** result/HV2700_1500_500_THLrel_274_sophy_chopper_60Hz_4.1mm_aperture_siemen_star_120s_000000/stream+tbb+tgdc ***
*** result/HV2700_1500_500_THLrel_274_sophy_chopper_60Hz_4.1mm_aperture_siemen_star_120s_000000/stream+verify+tgdc ***
*** result/HV2700_1500_500_THLrel_274_sophy_chopper_60Hz_4.1mm_aperture_siemen_star_120s_000000/mmap+tbb+tgdc ***
*** result/HV2700_1500_500_THLrel_274_sophy_chopper_60Hz_4.1mm_aperture_siemen_star_120s_000000/mmap+verify+tgdc ***
*** inspecting results of .h5 files produced ***
ALL 13 result files matched for every iteration
*** report statistics on recorded benchmarks ***
result/single-thread.csv: (hits/sec)
min: 34,474,080 median: 36,375,605 mean: 36,100,608 max: 36,918,600
result/tbb.csv: (hits/sec)
min: 88,892,390 median: 151,492,300 mean: 142,492,915 max: 158,682,300
# in each iteration, and compares the result of each iteration to check that the same answers are obtained. | ||
|
||
EXT=${1:-h5} | ||
# TODO: data/suann_socket_background_serval32.tpx3 produces poor performance statistics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds more like a NOTE
rather than TODO
.
@suannchong is probably the best one who can find out why this particular raw data is leading to false low performance output.
# TODO: the Sophiread program produces an .h5 file that slightly differs from app, so cannot be used at the moment | ||
# | ||
# Not comparable: </neutrons/nHits> is of class H5T_FLOAT and </neutrons/nHits> is of class H5T_INTEGER | ||
# Not comparable: </neutrons/tof> has rank 1, dimensions [190208], max dimensions [190208] | ||
# and </neutrons/tof> has rank 1, dimensions [190207], max dimensions [190207] | ||
# ... | ||
# seems like some minor adjustments are needed | ||
|
||
##if [ "$EXT" == "h5" ] ; then | ||
## ./Sophiread -i $f -E ${result}/${y}.verify.h5 ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The H5 output functions are using a templated call to process different dataset and the mapping for data types are broken, and the temp solution is to cast everything to float to avoid data loss.
#44 has more details regarding this issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything looks good, thank you very much for adding this critical feature 😄
Description of Pull Request
This PR implements the code necessary to operate over memory regions and to facilitate streaming operations.
Purpose/Summary of work
As per #51, adjustments to the
findTPX3H()
API are made to implement the 3 requirements. Test cases are added to verify that use ofmmap()
and memory spaces are equivalent to thestd::vector<char *>
of the current implementation. A new CLI programbenchmark_mmap.cpp
is provided to process both single-thread and tbb use cases in a looping mode that is conducive for streaming processing.Additional detail of work
Per discussions with@KedoKudo, I'm providing this PR with the
benchmark_mmap.cpp
as the reference implementation. The canonical CLI programsophiread.cpp
could be extended with memory mapped support, but I gratefully appreciate this PR to be considered using the above reference. I do apologize thatbenchmark_mmap.cpp
duplicates some of the functions in the common library and I am willing to offer additional PRs to resolve these, if needed.The environment variable
MAX_BATCH_LEN
is used to control the amount of intermediate data structures that are allocated during processing. The test programs are sensitive to this value as the number of events produced can vary. This is because partial events are not yet handled gracefully as batches are iterated upon. This is a known limitation at this time. The test programs have been calibrated with aMAX_BATCH_LEN=100000
, as this was known to prevent batching when processing the data setdata/suann_socket_background_serval32.tpx3
. However, the data setdata/HV2700_1500_500_THLrel_274_sophy_chopper_60Hz_4.1mm_aperture_siemen_star_120s_000000.tpx3
which is used in the performance benchmark, does roll-over at this setting ( it appears as if a length of 1000000 may be needed to prevent roll-over there ). This implementation has parsed theMAX_BATCH_LEN
from the environment insophiread/FastSophiread/src/tpx3_fast.cpp
. It is not critical that this restriction be imposed, but lack of it will prevent extremely large.tpx3
files from being successfully processed.Testing instructions
Test using standard build instructions, and operations. To observe the use of mmap(), use
FastSophireadBenchmarksCLI.app
and provide themmap
string as options. For example:A suite of test cases and benchmarks comparing the operation of single/tbb/streaming/mmap can be executed via
FastSophiread_benchmark.sh
. As demonstrated by:mcpevent2hist-pr-mmap-test-output.txt