Provide support for memory mapped operation #52

bvacaliuc · 2024-01-30T22:26:29Z

Description of Pull Request

This PR implements the code necessary to operate over memory regions and to facilitate streaming operations.

Purpose/Summary of work

As per #51, adjustments to the findTPX3H() API are made to implement the 3 requirements. Test cases are added to verify that use of mmap() and memory spaces are equivalent to the std::vector<char *> of the current implementation. A new CLI program benchmark_mmap.cpp is provided to process both single-thread and tbb use cases in a looping mode that is conducive for streaming processing.

Additional detail of work

Per discussions with@KedoKudo, I'm providing this PR with the benchmark_mmap.cpp as the reference implementation. The canonical CLI program sophiread.cpp could be extended with memory mapped support, but I gratefully appreciate this PR to be considered using the above reference. I do apologize that benchmark_mmap.cpp duplicates some of the functions in the common library and I am willing to offer additional PRs to resolve these, if needed.

The environment variable MAX_BATCH_LEN is used to control the amount of intermediate data structures that are allocated during processing. The test programs are sensitive to this value as the number of events produced can vary. This is because partial events are not yet handled gracefully as batches are iterated upon. This is a known limitation at this time. The test programs have been calibrated with a MAX_BATCH_LEN=100000, as this was known to prevent batching when processing the data set data/suann_socket_background_serval32.tpx3. However, the data set data/HV2700_1500_500_THLrel_274_sophy_chopper_60Hz_4.1mm_aperture_siemen_star_120s_000000.tpx3 which is used in the performance benchmark, does roll-over at this setting ( it appears as if a length of 1000000 may be needed to prevent roll-over there ). This implementation has parsed the MAX_BATCH_LEN from the environment in sophiread/FastSophiread/src/tpx3_fast.cpp. It is not critical that this restriction be imposed, but lack of it will prevent extremely large .tpx3 files from being successfully processed.

Testing instructions

Test using standard build instructions, and operations. To observe the use of mmap(), use FastSophireadBenchmarksCLI.app and provide the mmap string as options. For example:

./FastSophireadBenchmarksCLI.app data/HV2700_1500_500_THLrel_274_sophy_chopper_60Hz_4.1mm_aperture_siemen_star_120s_000000.tpx3 /tmp/foo.h5 tgdc+mmap

A suite of test cases and benchmarks comparing the operation of single/tbb/streaming/mmap can be executed via FastSophiread_benchmark.sh. As demonstrated by:

conda activate sophiread
mkdir sophiread/build && cd sophiread/build
cmake ..
make -j4
make test
./FastSophiread_benchmark.sh

mcpevent2hist-pr-mmap-test-output.txt

…ornlneutronimaging#51

…ng with tbb; Refs ornlneutronimaging#51

…a produces poor performance statistics at the moment

KedoKudo

Tested locally on m2 mac and everything passes (see terminal output below)

@bvacaliuc could you check if it is necessary to switch from std::vector to tbb::concurrent_vector when collecting output from tbb::thread?

❯ ./FastSophiread_benchmark.sh 
*** baseline output for HV2700_1500_500_THLrel_274_sophy_chopper_60Hz_4.1mm_aperture_siemen_star_120s_000000.tpx3
[2024-01-31 12:40:54.145] [info] File size: 43399744 bytes

raw_data: 42551688/43399744 (98%)
[2024-01-31 12:40:54.351] [info] Number of hits: 5303320
[2024-01-31 12:40:54.351] [info] Number of events: 376746
[2024-01-31 12:40:54.351] [info] bad/total hit ratio: 0.00%
[2024-01-31 12:40:54.351] [info] Reading speed: 2.502038e+08 hits/s
[2024-01-31 12:40:54.351] [info] Batching speed: 3.433459e+08 hits/s
[2024-01-31 12:40:54.351] [info] Single-Thread speed: 3.436284e+07 hits/s
[2024-01-31 12:40:54.351] [info] Aggregate speed: 2.575252e+07 hits/s
[2024-01-31 12:40:54.358] [info] Gathering Events speed: 1.371482e+08 events/s
[2024-01-31 12:40:54.358] [info] Writing Output speed: 9.355500e+07 events/s

*** result/HV2700_1500_500_THLrel_274_sophy_chopper_60Hz_4.1mm_aperture_siemen_star_120s_000000/stream+tbb+tgdc ***
*** result/HV2700_1500_500_THLrel_274_sophy_chopper_60Hz_4.1mm_aperture_siemen_star_120s_000000/stream+verify+tgdc ***
*** result/HV2700_1500_500_THLrel_274_sophy_chopper_60Hz_4.1mm_aperture_siemen_star_120s_000000/mmap+tbb+tgdc ***
*** result/HV2700_1500_500_THLrel_274_sophy_chopper_60Hz_4.1mm_aperture_siemen_star_120s_000000/mmap+verify+tgdc ***

*** inspecting results of .h5 files produced ***
ALL 13 result files matched for every iteration

*** report statistics on recorded benchmarks ***
result/single-thread.csv: (hits/sec)
min: 34,474,080 median: 36,375,605 mean: 36,100,608 max: 36,918,600

result/tbb.csv: (hits/sec)
min: 88,892,390 median: 151,492,300 mean: 142,492,915 max: 158,682,300

KedoKudo · 2024-01-31T15:41:35Z

sophiread/FastSophiread/benchmarks/benchmark.sh

+# in each iteration, and compares the result of each iteration to check that the same answers are obtained.
+
+EXT=${1:-h5}
+# TODO: data/suann_socket_background_serval32.tpx3 produces poor performance statistics


This sounds more like a NOTE rather than TODO.
@suannchong is probably the best one who can find out why this particular raw data is leading to false low performance output.

KedoKudo · 2024-01-31T15:44:59Z

sophiread/FastSophiread/benchmarks/benchmark.sh

+    # TODO: the Sophiread program produces an .h5 file that slightly differs from app, so cannot be used at the moment
+    #
+    # Not comparable: </neutrons/nHits> is of class H5T_FLOAT and </neutrons/nHits> is of class H5T_INTEGER
+    # Not comparable: </neutrons/tof> has rank 1, dimensions [190208], max dimensions [190208]
+    #   and </neutrons/tof> has rank 1, dimensions [190207], max dimensions [190207]
+    # ...
+    # seems like some minor adjustments are needed
+
+    ##if [ "$EXT" == "h5" ] ; then
+    ##    ./Sophiread -i $f -E ${result}/${y}.verify.h5 ;


The H5 output functions are using a templated call to process different dataset and the mapping for data types are broken, and the temp solution is to cast everything to float to avoid data loss.
#44 has more details regarding this issue.

sophiread/FastSophiread/benchmarks/benchmark.sh

sophiread/FastSophiread/benchmarks/benchmark_mmap.cpp

sophiread/FastSophiread/src/disk_io.cpp

sophiread/FastSophiread/src/tpx3_fast.cpp

KedoKudo

Everything looks good, thank you very much for adding this critical feature 😄

Bogdan Vacaliuc added 4 commits January 30, 2024 12:24

part 1: implement mmap(), variants to findTPX3H() and test cases; Refs …

945c9eb

…ornlneutronimaging#51

part 1a: fix syntax errors due to edits to make batch size configurable

6ae533f

part 2: add benchmark example showing use of mmap(), looping processi…

1412d37

…ng with tbb; Refs ornlneutronimaging#51

benchmark only the siemen star data as the other dataset with GDC dat…

eff0935

…a produces poor performance statistics at the moment

bvacaliuc marked this pull request as ready for review January 30, 2024 22:29

KedoKudo self-requested a review January 31, 2024 15:28

KedoKudo self-assigned this Jan 31, 2024

KedoKudo added the enhancement New feature or request label Jan 31, 2024

KedoKudo requested changes Jan 31, 2024

View reviewed changes

Bogdan Vacaliuc and others added 3 commits February 3, 2024 14:50

use tbb::concurrent_vector

bce98b8

apply requested modifications (cosmetic); Refs ornlneutronimaging#52

1216a48

Su-Ann says TOF does not need to be in ascending order

871f778

KedoKudo approved these changes Feb 4, 2024

View reviewed changes

KedoKudo merged commit ccfaf28 into ornlneutronimaging:next Feb 4, 2024
1 check passed

bvacaliuc deleted the pr/mmap branch February 5, 2024 21:36

This was referenced Feb 7, 2024

Add support for memory mapped operation #51

Open

Fix improper loop exit with MAX_BATCH_LEN and fix the test case for it #54

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide support for memory mapped operation #52

Provide support for memory mapped operation #52

bvacaliuc commented Jan 30, 2024

KedoKudo left a comment

KedoKudo Jan 31, 2024

KedoKudo Jan 31, 2024

KedoKudo left a comment

Provide support for memory mapped operation #52

Provide support for memory mapped operation #52

Conversation

bvacaliuc commented Jan 30, 2024

Description of Pull Request

Purpose/Summary of work

Additional detail of work

Testing instructions

KedoKudo left a comment

Choose a reason for hiding this comment

KedoKudo Jan 31, 2024

Choose a reason for hiding this comment

KedoKudo Jan 31, 2024

Choose a reason for hiding this comment

KedoKudo left a comment

Choose a reason for hiding this comment