-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gpu kernel break #13
Gpu kernel break #13
Conversation
gpu/plscore.cu
Outdated
// cudaMemsetAsync(dev_mem->d_total_n_long, 0, sizeof(size_t), | ||
// *stream); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented this. Working fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add comment
@@ -490,14 +490,14 @@ void plscore_async_short_mid_forward_dp(deviceMemPtr* dev_mem, cudaStream_t* str | |||
dim3 shortDimGrid(score_kernel_config.short_griddim, 1, 1); | |||
dim3 midDimGrid(score_kernel_config.mid_griddim, 1, 1); | |||
|
|||
// Run kernel | |||
// printf("Grid Dim, %d\n", DimGrid.x); | |||
cudaMemsetAsync(dev_mem->d_long_seg_count, 0, sizeof(unsigned int), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If comment these line, it abort with
[Debug] plchain_cal_score_async (gpu/plchain.cu:572) batch_n 31853545, read_start 26, read_end 50
[Info] plrange_async_range_selection (gpu/plrange.cu:258): Batch total_n 31853545, Range Kernel Launched, grid 984 cut 62204
[Info] plscore_async_short_mid_forward_dp (gpu/plscore.cu:537) short mid score generation success
[Debug] plchain_cal_score_async (gpu/plchain.cu:572) batch_n 32180703, read_start 50, read_end 82
[Info] plrange_async_range_selection (gpu/plrange.cu:258): Batch total_n 32180703, Range Kernel Launched, grid 999 cut 62838
[Info] plscore_async_short_mid_forward_dp (gpu/plscore.cu:537) short mid score generation success
[Debug] plchain_cal_score_async (gpu/plchain.cu:572) batch_n 30397856, read_start 82, read_end 111
[Info] plrange_async_range_selection (gpu/plrange.cu:258): Batch total_n 30397856, Range Kernel Launched, grid 943 cut 59356
[Info] plscore_async_short_mid_forward_dp (gpu/plscore.cu:537) short mid score generation success
[Info] plscore_async_long_forward_dp (gpu/plscore.cu:563) long score generation success
Memory access fault by GPU node-2 (Agent handle: 0x825d1dc0) on address 0x7fbe88e09000. Reason: Unknown.
/var/spool/slurm-llnl/job59127/slurm_script: line 30: 4091078 Aborted (core dumped) ./minimap2 -t 1 --max-chain-skip=2147483647 --gpu-chain /shareddata/umich_folder/data/ONT/hg38.mmi /shareddata/umich_folder/data/ONT/random_500MBases_200kto300k.fa
gpu/plchain.cu
Outdated
#endif | ||
|
||
// DEBUG: debug analysis that involves sychronizating the whole device | ||
#if defined(DEBUG_CHECK) | ||
plchain_debug_analysis(stream_setup.streams[stream_id]); | ||
#endif // DEBUG_VERBOSE | ||
// reset values | ||
cudaMemsetAsync(stream_setup.streams[stream_id].dev_mem.d_long_seg_count, 0, sizeof(unsigned int), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reset counters after each large batch
gpu/plchain.cu
Outdated
// NOTE: this is the number of long segs till this microbatch | ||
size_t long_segs_num = stream_setup.streams[stream_id].host_mems[uid].long_segs_num; | ||
for (; long_seg_idx < long_segs_num; long_seg_idx++) { | ||
// TODO: write long_segs + long_seg_idx to f/p |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: copy p/f_long back
gpu/plchain.cu
Outdated
stream_setup.streams[t].reads, misc, km); | ||
// TODO: backtrack multiple pending batches | ||
// reset values | ||
cudaMemsetAsync(stream_setup.streams[t].dev_mem.d_long_seg_count, 0, sizeof(unsigned int), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reset counters once a large stream finish
max_range_grid, long_seg_buffer_size); | ||
} | ||
// one stream has one long mem and one device mem | ||
plmem_malloc_long_mem(&stream_setup.streams[i].long_mem, long_seg_buffer_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
malloc long mem
gpu/plmem.cu
Outdated
cudaMemsetAsync(&stream_setup.streams[i].dev_mem.d_mid_seg_count, 0, sizeof(unsigned int), | ||
stream_setup.streams[i].cudastream); | ||
cudaMemsetAsync(&stream_setup.streams[i].dev_mem.d_total_n_long, 0, sizeof(size_t), | ||
stream_setup.streams[i].cudastream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reset counters at the first initialize
@@ -23,10 +24,8 @@ typedef struct { | |||
uint16_t *p; // predecessor | |||
|
|||
// array size: number of cuts in the batch / long_seg_cut | |||
seg_t *long_segs; | |||
// total long segs number till this batch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still keep the accumulative value since we are running async
fprintf(stderr, "max_range_grid %d griddim %d, total_n %lu n_read %d\n", | ||
stream_setup.max_range_grid, griddim, total_n, n_read); | ||
} | ||
stream_setup.streams[stream_id].reads = reads; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move cudaMemsetAsync here. Reset long_seg_count and total_n_long before lauching async kernels into the stream.
gpu/plchain.cu
Outdated
stream_setup.streams[t].cudastream); | ||
cudaMemsetAsync(stream_setup.streams[t].dev_mem.d_total_n_long, 0, sizeof(size_t), | ||
stream_setup.streams[t].cudastream); | ||
seg_t* long_segs = stream_setup.streams[t].long_mem.long_segs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to reset long seg count here: will reset every time before lauching any kernels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aac_config.json
: manual config for fitting one batch + long buffer in aac gpu memory
Batch memory usage: 51.32 GB
Long buffer memory usage: 8.98 GB
This line of code will limit max_n by int32max |
When building the program, use |
scripts/aac_omnitrace.slurm
Outdated
# rocprof --stats -o rocprof_output/long_seg.${SLURM_JOB_ID}.csv ./minimap2 -t 1 --max-chain-skip=2147483647 --gpu-chain /shareddata/umich_folder/data/ONT/hg38.mmi /shareddata/umich_folder/data/ONT/short_seg_reads_from_1kto10k_distri.fa | ||
# rocprof --stats -o rocprof_output/long_seg.${SLURM_JOB_ID}.csv ./minimap2 -t 1 --max-chain-skip=2147483647 --gpu-chain /shareddata/umich_folder/data/ONT/hg38.mmi /shareddata/umich_folder/data/ONT/reads_4f452f4a-d82a-4580-981b-32d14b997217.fa | ||
# omnitrace-sample -PTDH -E all -I rocm-smi -I roctracer -I rocprofiler -I roctx -o omni_output -- ./minimap2 -K 2000000000 -t 1 --max-chain-skip=2147483647 --gpu-chain /shareddata/umich_folder/data/ONT/hg38.mmi /shareddata/umich_folder/data/ONT/random_500MBases_200kto300k.fa | ||
omnitrace-sample -PTDH -E all -I rocm-smi -I roctracer -I rocprofiler -I roctx -o omni_output -- ./minimap2 -K 2000000000 -t 1 --max-chain-skip=2147483647 --gpu-chain /shareddata/umich_folder/data/ONT/hg38.mmi /shareddata/umich_folder/data/ONT/random_2GBases_10kto300k.fa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line runs the program under omnitrace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please follow scripts/aac_omnitrace.slurm
to profile the program with omnitrace.
…into gpu_kernel-break
gpu/plscore.cu
Outdated
// init the first batch as the size of the grid | ||
curr_long_segid = 0; | ||
} | ||
auto start = clock64(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use runtime print clock to see the time of each block
gpu/plscore.cu
Outdated
curr_long_segid = 0; | ||
} | ||
auto start = clock64(); | ||
unsigned int segid = atomicAdd(&curr_long_segid, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried start from bid and add atomic value in following cycles,
unsigned segid = bid;
while () {
...
segid = atomicAdd(&curr_long_segid, 1);
}
But this gives much worse performance.
I suppose the block is not launched in order, (i.e. block#0 might be launched after block#100)
#endif | ||
unsigned int segid = atomicAdd(&curr_long_segid, 1); | ||
while (segid < *long_seg_count) { | ||
seg_t seg = long_seg[map[segid]]; // sorted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Segfault here if compiled with -ggdb flag when using hipcc.
// FIXME: temporary solution for microbatching | ||
if (read_start < n_read) { | ||
fprintf(stderr, "[WARNING] Unable to fit reads %d - %d into a microbatch. Fall back to cpu chaining\n", read_start, n_read-1); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Temporary fix: somethings the total number of anchors < MICRO_BATCH * batch_size, but cannot be divided into microbatches without cutting a read into two halves. If 1 or 2 reads are left behind because of this, we use CPU kenel in map.c. In the future, different threads creates different microbatch, and is responsible for making sure this does not happen.
// FIXME: temporary solution for reads fail to fit in microbatch | ||
// cpu kernel | ||
for (kernel_batch.count; kernel_batch.count<tr->launched_batch.count; kernel_batch.count++) { | ||
fprintf(stderr, "[WARNING] Run CPU kernel for read %d\n", iread); | ||
mm_map_chain(s->p->mi, s->p->opt, &kernel_batch.reads[kernel_batch.count], b, kernel_batch.km); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Temporary fix for batches that cannot be divided evenly into microbatches.
// FIXME: temporary solution for reads fail to fit in microbatch | ||
// cpu kernel | ||
for (kernel_batch.count; kernel_batch.count<tr->launched_batch.count; kernel_batch.count++) { | ||
fprintf(stderr, "[WARNING] Run CPU kernel for read %d\n", kernel_batch.count); | ||
mm_map_chain(s->p->mi, s->p->opt, &kernel_batch.reads[kernel_batch.count], b, tr->launched_batch.km); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Temporay fix for batches that cannot be evenly divided into micro batches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure to enable backtracking after performance testing! (in plchain.cu, search for FIXME)
…into gpu_kernel-break
…into gpu_kernel-break
…into gpu_kernel-break
Squashed commit of the following: commit 938a45f Author: joydddd <joydong@umich.edu> Date: Mon Mar 18 13:26:25 2024 -0400 Add planalyze.cu(h) commit fee935f Merge: ec8a340 41062d0 Author: joydddd <joydong@umich.edu> Date: Mon Mar 18 13:18:42 2024 -0400 Merge branch 'gpu_kernel-break' into gpu_kernel commit 41062d0 Author: joydddd <joydong@umich.edu> Date: Mon Mar 18 13:17:15 2024 -0400 Clean up compile options. Move config to cmd option / gpu_config.json commit ec8a340 Merge: 31dff82 3218873 Author: Joy Juechu Dong <joydong@umich.edu> Date: Wed Mar 13 23:37:55 2024 -0400 Merge pull request #13 from Minimap2onGPU:gpu_kernel-break Add long segment aggregation. Cleaned up code for sysbio submission. TODO: Add README commit 3218873 Author: joydddd <joydong@umich.edu> Date: Wed Mar 13 23:31:16 2024 -0400 Update print compile time config commit ad70b6a Author: joydddd <joydong@umich.edu> Date: Wed Mar 13 23:09:51 2024 -0400 cleanup gpu code for open source. TODO: Add README commit 4d2459c Merge: 13b2eab 5c742d9 Author: Joy Dong <joydong@umich.edu> Date: Wed Mar 13 19:34:56 2024 -0500 Merge branch 'gpu_kernel-break' of github.com:Minimap2onGPU/minimap2 into gpu_kernel-break commit 13b2eab Author: Joy Dong <joydong@umich.edu> Date: Wed Mar 13 19:34:05 2024 -0500 Add range distribution analysis commit f7ecde1 Author: Joy Dong <joydong@umich.edu> Date: Wed Mar 6 12:14:26 2024 -0600 Add data analysis script commit 5c742d9 Author: Joy Dong <joydong@umich.edu> Date: Wed Mar 6 12:14:26 2024 -0600 Add data analysis script commit 12c3fc0 Author: Xueshen Liu <liuxs@umich.edu> Date: Fri Mar 1 22:50:35 2024 -0600 config aac that maximize memory usage commit eed2640 Author: Xueshen Liu <liuxs@umich.edu> Date: Fri Mar 1 00:24:31 2024 -0600 comment in kernel print to maximize tp commit c982bb5 Merge: 94f333a d8ba447 Author: Xueshen Liu <liuxs@umich.edu> Date: Wed Feb 28 11:14:20 2024 -0600 Merge branch 'gpu_kernel-break' of github.com:Minimap2onGPU/minimap2 into gpu_kernel-break commit 94f333a Author: Xueshen Liu <liuxs@umich.edu> Date: Wed Feb 28 11:14:16 2024 -0600 update plscore commit d8ba447 Author: joydddd <joydong@umich.edu> Date: Mon Feb 26 12:40:55 2024 -0500 Remove skip backtracking in GPU implementation. Outputs are correct commit 6836cb7 Merge: 907748f 3239112 Author: Xueshen Liu <liuxs@umich.edu> Date: Fri Feb 23 19:13:51 2024 -0600 Merge branch 'gpu_kernel-break' of github.com:Minimap2onGPU/minimap2 into gpu_kernel-break commit 907748f Author: Xueshen Liu <liuxs@umich.edu> Date: Fri Feb 23 19:13:46 2024 -0600 add seg count commit 3239112 Author: joydddd <joydong@umich.edu> Date: Thu Feb 22 23:26:47 2024 -0500 Add put long segs back to original reads, but output seems to be wrong?? commit 474e746 Author: joydddd <joydong@umich.edu> Date: Thu Feb 22 22:37:27 2024 -0500 Temporal Fix microbacthing error (Use CPU kernel) commit 66e2e54 Author: Xueshen Liu <liuxs@umich.edu> Date: Thu Feb 22 11:15:52 2024 -0600 fix atomic add in long seg, only first thread in block add the atomic commit c40bf9f Author: Xueshen Liu <liuxs@umich.edu> Date: Sat Feb 17 00:38:19 2024 -0600 fix atomicadd -> atomicsub, TODO: add more cudaCheck commit 9678399 Author: Joy Dong <joydong@umich.edu> Date: Thu Feb 15 23:45:36 2024 -0600 Fix throughput analysis commit 39f758b Author: Joy Dong <joydong@umich.edu> Date: Thu Feb 15 22:50:52 2024 -0500 Edit throughput calculation. JIT Compilat error on cuda, push to try on HIP commit b21c5f9 Author: Joy Dong <joydong@umich.edu> Date: Tue Feb 13 15:46:20 2024 -0500 Update debug analysis commit cff2e27 Author: Xueshen Liu <liuxs@umich.edu> Date: Fri Feb 9 10:12:52 2024 -0600 debug info control commit accea33 Merge: d9e7396 e1248fb Author: Xueshen Liu <liuxs@umich.edu> Date: Fri Feb 9 09:44:56 2024 -0600 Merge branch 'gpu_kernel-break' of github.com:Minimap2onGPU/minimap2 into gpu_kernel-break commit d9e7396 Author: Xueshen Liu <liuxs@umich.edu> Date: Fri Feb 9 09:44:52 2024 -0600 add atomic runtime balancing commit e1248fb Author: Joy Dong <joydong@umich.edu> Date: Thu Feb 8 20:01:02 2024 -0500 Update throughput calculation, a6000 config commit 4aeacfd Author: Xueshen Liu <liuxs@umich.edu> Date: Thu Feb 8 12:48:15 2024 -0600 add sorting technique commit 8ee89c1 Author: Xueshen Liu <liuxs@umich.edu> Date: Tue Feb 6 15:09:15 2024 -0600 update scripts commit 216f2b2 Author: Joy Dong <joydong@umich.edu> Date: Fri Jan 26 12:01:24 2024 -0600 Add kernel throughput calculatation commit 15d0003 Author: Joy Dong <joydong@umich.edu> Date: Wed Jan 10 21:11:34 2024 -0600 change script path commit aa62643 Author: Xueshen Liu <liuxs@umich.edu> Date: Wed Jan 10 14:37:39 2024 -0600 use hostmalloc to avoid step1 delay commit 85f1cbb Author: Xueshen Liu <liuxs@umich.edu> Date: Fri Nov 3 19:26:00 2023 -0500 finish microbatch design, TODO: add batch number to config, and use hostmalloc commit 1a585ab Author: Joy Dong <joydong@umich.edu> Date: Mon Oct 30 16:23:50 2023 -0500 Add acc_config. FIX seg fault for long_seg_count reset commit cb1a30e Author: Xueshen Liu <liuxs@umich.edu> Date: Fri Oct 27 13:56:12 2023 -0500 no reset long seg on each micro batch cause fault commit 5e1abe8 Author: Xueshen Liu <liuxs@umich.edu> Date: Fri Oct 27 11:34:49 2023 -0500 finish minibatch, parameter is still hardcoded, debug function need fix, f_long is not copied back commit 5598718 Author: Xueshen Liu <liuxs@umich.edu> Date: Thu Oct 26 09:46:48 2023 -0500 add omnitrace scripts commit 31dff82 Author: Joy Juechu Dong <joydong@umich.edu> Date: Thu Oct 19 23:01:03 2023 -0400 Add aggregate long segs (#12) commit cd225fa Author: Xueshen Liu <liuxs@umich.edu> Date: Thu Oct 12 09:49:48 2023 -0500 add new api translation, enable analysis print commit 247a17f Merge: 1d5c636 1e8781a Author: joydddd <joydong@umich.edu> Date: Tue Oct 10 22:00:31 2023 -0400 Merge commit '1e8781a' into gpu_kernel commit 1e8781a Author: joydddd <joydong@umich.edu> Date: Tue Oct 10 21:56:18 2023 -0400 Fix debug functions commit 1d5c636 Merge: d334c9e ccddf5c Author: joydddd <joydong@umich.edu> Date: Wed Sep 27 11:22:11 2023 -0400 Delete gpu/.depend commit d334c9e Author: Joy Juechu Dong <joydong@umich.edu> Date: Tue Sep 26 15:30:12 2023 -0400 merge with aac code including CUDA timer (#9) commit ccddf5c Author: Xueshen Liu <liuxs@umich.edu> Date: Tue Sep 26 16:55:27 2023 -0500 add sample slurm commit 7b65403 Merge: cabe321 6897137 Author: Xueshen Liu <liuxs@umich.edu> Date: Tue Sep 26 16:41:38 2023 -0500 Merge branch 'gpu_kernel' of github.com:Minimap2onGPU/minimap2 into gpu_kernel commit cabe321 Author: Xueshen Liu <liuxs@umich.edu> Date: Tue Sep 26 16:39:58 2023 -0500 add profiling scripts, please check scripts/*.slurm commit 6897137 Author: Joy Juechu Dong <joydong@umich.edu> Date: Tue Sep 26 15:30:12 2023 -0400 merge with aac code including CUDA timer (#9) commit e144d2d Merge: 4044b9f 1d5d7ab Author: Xueshen Liu <liuxs@umich.edu> Date: Tue Sep 5 19:50:37 2023 -0400 merge with aac code including CUDA timer commit 4044b9f Author: Xueshen Liu <liuxs@umich.edu> Date: Tue Sep 5 19:47:31 2023 -0400 remove const keyword commit 1d5d7ab Author: Xueshen Liu <liuxs@umich.edu> Date: Tue Sep 5 17:52:39 2023 -0500 add timer by event recorder commit 10e7197 Author: Xueshen Liu <liuxs@umich.edu> Date: Wed Jul 12 10:49:27 2023 -0400 update parseing profile scripts commit ae3ecd3 Author: Xueshen Liu <liuxs@umich.edu> Date: Wed Jul 12 10:46:53 2023 -0400 add profile scripts commit cff2019 Author: Xueshen Liu <liuxs@umich.edu> Date: Wed Jul 12 10:46:41 2023 -0400 add profile scripts commit 26a4044 Author: Xueshen Liu <liuxs@umich.edu> Date: Wed Jul 12 10:45:37 2023 -0400 add profile scripts commit ced508a Author: Xueshen Liu <liuxs@umich.edu> Date: Tue Jul 11 22:47:19 2023 -0400 add cudacheck and uncomment syncthreads commit a01dd39 Merge: ea9c438 10a5a8b Author: Xueshen Liu <liuxs@umich.edu> Date: Tue Jul 11 16:03:45 2023 -0400 merge with independent mid kernel commit ea9c438 Author: Xueshen Liu <liuxs@umich.edu> Date: Tue Jul 11 15:19:51 2023 -0400 add global config commit 10a5a8b Merge: 7018ece 18bc4ac Author: joydddd <joydong@umich.edu> Date: Wed Jul 5 11:06:17 2023 -0400 Merge branch 'gpu_kernel' of github.com:Minimap2onGPU/minimap2 into gpu_kernel commit 7018ece Author: joydddd <joydong@umich.edu> Date: Wed Jul 5 10:58:35 2023 -0400 Add anchor compression commit 18bc4ac Author: Xueshen Liu <liuxs@umich.edu> Date: Wed Jun 28 19:21:06 2023 -0400 merge makefile with nvcc version commit 5a12d5b Author: Xueshen Liu <liuxs@umich.edu> Date: Wed Jun 28 19:19:25 2023 -0400 add template to mid kernel launch commit 4318078 Author: joydddd <joydong@umich.edu> Date: Wed Jun 28 15:28:53 2023 -0400 Add num segs printout commit 5906e29 Merge: 9149e17 e6f016c Author: joydddd <joydong@umich.edu> Date: Wed Jun 28 13:17:41 2023 -0400 Merge branch 'gpu_kernel' of github.com:Minimap2onGPU/minimap2 into gpu_kernel commit 9149e17 Author: joydddd <joydong@umich.edu> Date: Wed Jun 28 13:11:40 2023 -0400 Add short-mid-long kernel commit e6f016c Author: Xueshen Liu <liuxs@umich.edu> Date: Sun Jun 25 12:21:50 2023 -0400 add compile time launch bound using template commit c601518 Author: Xueshen Liu <liuxs@umich.edu> Date: Sun Jun 25 12:15:49 2023 -0400 add compile time launch bound commit be01974 Author: Xueshen Liu <liuxs@umich.edu> Date: Sat Jun 10 16:16:23 2023 -0400 disable new offset feature commit 2d46499 Author: Xueshen Liu <liuxs@umich.edu> Date: Sat Jun 10 16:05:53 2023 -0400 merge with cuda
* Merge gpu-kernel. Squashed commit of the following: commit 41062d0 Author: joydddd <joydong@umich.edu> Date: Mon Mar 18 13:17:15 2024 -0400 Clean up compile options. Move config to cmd option / gpu_config.json commit 3218873 Author: joydddd <joydong@umich.edu> Date: Wed Mar 13 23:31:16 2024 -0400 Update print compile time config commit ad70b6a Author: joydddd <joydong@umich.edu> Date: Wed Mar 13 23:09:51 2024 -0400 cleanup gpu code for open source. TODO: Add README commit 4d2459c Merge: 13b2eab 5c742d9 Author: Joy Dong <joydong@umich.edu> Date: Wed Mar 13 19:34:56 2024 -0500 Merge branch 'gpu_kernel-break' of github.com:Minimap2onGPU/minimap2 into gpu_kernel-break commit 13b2eab Author: Joy Dong <joydong@umich.edu> Date: Wed Mar 13 19:34:05 2024 -0500 Add range distribution analysis commit f7ecde1 Author: Joy Dong <joydong@umich.edu> Date: Wed Mar 6 12:14:26 2024 -0600 Add data analysis script commit 5c742d9 Author: Joy Dong <joydong@umich.edu> Date: Wed Mar 6 12:14:26 2024 -0600 Add data analysis script commit 12c3fc0 Author: Xueshen Liu <liuxs@umich.edu> Date: Fri Mar 1 22:50:35 2024 -0600 config aac that maximize memory usage commit eed2640 Author: Xueshen Liu <liuxs@umich.edu> Date: Fri Mar 1 00:24:31 2024 -0600 comment in kernel print to maximize tp commit c982bb5 Merge: 94f333a d8ba447 Author: Xueshen Liu <liuxs@umich.edu> Date: Wed Feb 28 11:14:20 2024 -0600 Merge branch 'gpu_kernel-break' of github.com:Minimap2onGPU/minimap2 into gpu_kernel-break commit 94f333a Author: Xueshen Liu <liuxs@umich.edu> Date: Wed Feb 28 11:14:16 2024 -0600 update plscore commit d8ba447 Author: joydddd <joydong@umich.edu> Date: Mon Feb 26 12:40:55 2024 -0500 Remove skip backtracking in GPU implementation. Outputs are correct commit 6836cb7 Merge: 907748f 3239112 Author: Xueshen Liu <liuxs@umich.edu> Date: Fri Feb 23 19:13:51 2024 -0600 Merge branch 'gpu_kernel-break' of github.com:Minimap2onGPU/minimap2 into gpu_kernel-break commit 907748f Author: Xueshen Liu <liuxs@umich.edu> Date: Fri Feb 23 19:13:46 2024 -0600 add seg count commit 3239112 Author: joydddd <joydong@umich.edu> Date: Thu Feb 22 23:26:47 2024 -0500 Add put long segs back to original reads, but output seems to be wrong?? commit 474e746 Author: joydddd <joydong@umich.edu> Date: Thu Feb 22 22:37:27 2024 -0500 Temporal Fix microbacthing error (Use CPU kernel) commit 66e2e54 Author: Xueshen Liu <liuxs@umich.edu> Date: Thu Feb 22 11:15:52 2024 -0600 fix atomic add in long seg, only first thread in block add the atomic commit c40bf9f Author: Xueshen Liu <liuxs@umich.edu> Date: Sat Feb 17 00:38:19 2024 -0600 fix atomicadd -> atomicsub, TODO: add more cudaCheck commit 9678399 Author: Joy Dong <joydong@umich.edu> Date: Thu Feb 15 23:45:36 2024 -0600 Fix throughput analysis commit 39f758b Author: Joy Dong <joydong@umich.edu> Date: Thu Feb 15 22:50:52 2024 -0500 Edit throughput calculation. JIT Compilat error on cuda, push to try on HIP commit b21c5f9 Author: Joy Dong <joydong@umich.edu> Date: Tue Feb 13 15:46:20 2024 -0500 Update debug analysis commit cff2e27 Author: Xueshen Liu <liuxs@umich.edu> Date: Fri Feb 9 10:12:52 2024 -0600 debug info control commit accea33 Merge: d9e7396 e1248fb Author: Xueshen Liu <liuxs@umich.edu> Date: Fri Feb 9 09:44:56 2024 -0600 Merge branch 'gpu_kernel-break' of github.com:Minimap2onGPU/minimap2 into gpu_kernel-break commit d9e7396 Author: Xueshen Liu <liuxs@umich.edu> Date: Fri Feb 9 09:44:52 2024 -0600 add atomic runtime balancing commit e1248fb Author: Joy Dong <joydong@umich.edu> Date: Thu Feb 8 20:01:02 2024 -0500 Update throughput calculation, a6000 config commit 4aeacfd Author: Xueshen Liu <liuxs@umich.edu> Date: Thu Feb 8 12:48:15 2024 -0600 add sorting technique commit 8ee89c1 Author: Xueshen Liu <liuxs@umich.edu> Date: Tue Feb 6 15:09:15 2024 -0600 update scripts commit 216f2b2 Author: Joy Dong <joydong@umich.edu> Date: Fri Jan 26 12:01:24 2024 -0600 Add kernel throughput calculatation commit 15d0003 Author: Joy Dong <joydong@umich.edu> Date: Wed Jan 10 21:11:34 2024 -0600 change script path commit aa62643 Author: Xueshen Liu <liuxs@umich.edu> Date: Wed Jan 10 14:37:39 2024 -0600 use hostmalloc to avoid step1 delay commit 85f1cbb Author: Xueshen Liu <liuxs@umich.edu> Date: Fri Nov 3 19:26:00 2023 -0500 finish microbatch design, TODO: add batch number to config, and use hostmalloc commit 1a585ab Author: Joy Dong <joydong@umich.edu> Date: Mon Oct 30 16:23:50 2023 -0500 Add acc_config. FIX seg fault for long_seg_count reset commit cb1a30e Author: Xueshen Liu <liuxs@umich.edu> Date: Fri Oct 27 13:56:12 2023 -0500 no reset long seg on each micro batch cause fault commit 5e1abe8 Author: Xueshen Liu <liuxs@umich.edu> Date: Fri Oct 27 11:34:49 2023 -0500 finish minibatch, parameter is still hardcoded, debug function need fix, f_long is not copied back commit 5598718 Author: Xueshen Liu <liuxs@umich.edu> Date: Thu Oct 26 09:46:48 2023 -0500 add omnitrace scripts commit 31dff82 Author: Joy Juechu Dong <joydong@umich.edu> Date: Thu Oct 19 23:01:03 2023 -0400 Add aggregate long segs (#12) commit cd225fa Author: Xueshen Liu <liuxs@umich.edu> Date: Thu Oct 12 09:49:48 2023 -0500 add new api translation, enable analysis print commit 247a17f Merge: 1d5c636 1e8781a Author: joydddd <joydong@umich.edu> Date: Tue Oct 10 22:00:31 2023 -0400 Merge commit '1e8781a' into gpu_kernel commit 1e8781a Author: joydddd <joydong@umich.edu> Date: Tue Oct 10 21:56:18 2023 -0400 Fix debug functions commit 1d5c636 Merge: d334c9e ccddf5c Author: joydddd <joydong@umich.edu> Date: Wed Sep 27 11:22:11 2023 -0400 Delete gpu/.depend commit d334c9e Author: Joy Juechu Dong <joydong@umich.edu> Date: Tue Sep 26 15:30:12 2023 -0400 merge with aac code including CUDA timer (#9) commit ccddf5c Author: Xueshen Liu <liuxs@umich.edu> Date: Tue Sep 26 16:55:27 2023 -0500 add sample slurm commit 7b65403 Merge: cabe321 6897137 Author: Xueshen Liu <liuxs@umich.edu> Date: Tue Sep 26 16:41:38 2023 -0500 Merge branch 'gpu_kernel' of github.com:Minimap2onGPU/minimap2 into gpu_kernel commit cabe321 Author: Xueshen Liu <liuxs@umich.edu> Date: Tue Sep 26 16:39:58 2023 -0500 add profiling scripts, please check scripts/*.slurm commit 6897137 Author: Joy Juechu Dong <joydong@umich.edu> Date: Tue Sep 26 15:30:12 2023 -0400 merge with aac code including CUDA timer (#9) commit e144d2d Merge: 4044b9f 1d5d7ab Author: Xueshen Liu <liuxs@umich.edu> Date: Tue Sep 5 19:50:37 2023 -0400 merge with aac code including CUDA timer commit 4044b9f Author: Xueshen Liu <liuxs@umich.edu> Date: Tue Sep 5 19:47:31 2023 -0400 remove const keyword commit 1d5d7ab Author: Xueshen Liu <liuxs@umich.edu> Date: Tue Sep 5 17:52:39 2023 -0500 add timer by event recorder commit 10e7197 Author: Xueshen Liu <liuxs@umich.edu> Date: Wed Jul 12 10:49:27 2023 -0400 update parseing profile scripts commit ae3ecd3 Author: Xueshen Liu <liuxs@umich.edu> Date: Wed Jul 12 10:46:53 2023 -0400 add profile scripts commit cff2019 Author: Xueshen Liu <liuxs@umich.edu> Date: Wed Jul 12 10:46:41 2023 -0400 add profile scripts commit 26a4044 Author: Xueshen Liu <liuxs@umich.edu> Date: Wed Jul 12 10:45:37 2023 -0400 add profile scripts commit ced508a Author: Xueshen Liu <liuxs@umich.edu> Date: Tue Jul 11 22:47:19 2023 -0400 add cudacheck and uncomment syncthreads commit a01dd39 Merge: ea9c438 10a5a8b Author: Xueshen Liu <liuxs@umich.edu> Date: Tue Jul 11 16:03:45 2023 -0400 merge with independent mid kernel commit ea9c438 Author: Xueshen Liu <liuxs@umich.edu> Date: Tue Jul 11 15:19:51 2023 -0400 add global config commit 10a5a8b Merge: 7018ece 18bc4ac Author: joydddd <joydong@umich.edu> Date: Wed Jul 5 11:06:17 2023 -0400 Merge branch 'gpu_kernel' of github.com:Minimap2onGPU/minimap2 into gpu_kernel commit 7018ece Author: joydddd <joydong@umich.edu> Date: Wed Jul 5 10:58:35 2023 -0400 Add anchor compression commit 18bc4ac Author: Xueshen Liu <liuxs@umich.edu> Date: Wed Jun 28 19:21:06 2023 -0400 merge makefile with nvcc version commit 5a12d5b Author: Xueshen Liu <liuxs@umich.edu> Date: Wed Jun 28 19:19:25 2023 -0400 add template to mid kernel launch commit 4318078 Author: joydddd <joydong@umich.edu> Date: Wed Jun 28 15:28:53 2023 -0400 Add num segs printout commit 5906e29 Merge: 9149e17 e6f016c Author: joydddd <joydong@umich.edu> Date: Wed Jun 28 13:17:41 2023 -0400 Merge branch 'gpu_kernel' of github.com:Minimap2onGPU/minimap2 into gpu_kernel commit 9149e17 Author: joydddd <joydong@umich.edu> Date: Wed Jun 28 13:11:40 2023 -0400 Add short-mid-long kernel commit e6f016c Author: Xueshen Liu <liuxs@umich.edu> Date: Sun Jun 25 12:21:50 2023 -0400 add compile time launch bound using template commit c601518 Author: Xueshen Liu <liuxs@umich.edu> Date: Sun Jun 25 12:15:49 2023 -0400 add compile time launch bound commit be01974 Author: Xueshen Liu <liuxs@umich.edu> Date: Sat Jun 10 16:16:23 2023 -0400 disable new offset feature commit 2d46499 Author: Xueshen Liu <liuxs@umich.edu> Date: Sat Jun 10 16:05:53 2023 -0400 merge with cuda * Revert "Merge gpu-kernel. Squashed commit of the following:" This reverts commit 481f3a5. * Merge gpu_kernel Squashed commit of the following: commit 938a45f Author: joydddd <joydong@umich.edu> Date: Mon Mar 18 13:26:25 2024 -0400 Add planalyze.cu(h) commit fee935f Merge: ec8a340 41062d0 Author: joydddd <joydong@umich.edu> Date: Mon Mar 18 13:18:42 2024 -0400 Merge branch 'gpu_kernel-break' into gpu_kernel commit 41062d0 Author: joydddd <joydong@umich.edu> Date: Mon Mar 18 13:17:15 2024 -0400 Clean up compile options. Move config to cmd option / gpu_config.json commit ec8a340 Merge: 31dff82 3218873 Author: Joy Juechu Dong <joydong@umich.edu> Date: Wed Mar 13 23:37:55 2024 -0400 Merge pull request #13 from Minimap2onGPU:gpu_kernel-break Add long segment aggregation. Cleaned up code for sysbio submission. TODO: Add README commit 3218873 Author: joydddd <joydong@umich.edu> Date: Wed Mar 13 23:31:16 2024 -0400 Update print compile time config commit ad70b6a Author: joydddd <joydong@umich.edu> Date: Wed Mar 13 23:09:51 2024 -0400 cleanup gpu code for open source. TODO: Add README commit 4d2459c Merge: 13b2eab 5c742d9 Author: Joy Dong <joydong@umich.edu> Date: Wed Mar 13 19:34:56 2024 -0500 Merge branch 'gpu_kernel-break' of github.com:Minimap2onGPU/minimap2 into gpu_kernel-break commit 13b2eab Author: Joy Dong <joydong@umich.edu> Date: Wed Mar 13 19:34:05 2024 -0500 Add range distribution analysis commit f7ecde1 Author: Joy Dong <joydong@umich.edu> Date: Wed Mar 6 12:14:26 2024 -0600 Add data analysis script commit 5c742d9 Author: Joy Dong <joydong@umich.edu> Date: Wed Mar 6 12:14:26 2024 -0600 Add data analysis script commit 12c3fc0 Author: Xueshen Liu <liuxs@umich.edu> Date: Fri Mar 1 22:50:35 2024 -0600 config aac that maximize memory usage commit eed2640 Author: Xueshen Liu <liuxs@umich.edu> Date: Fri Mar 1 00:24:31 2024 -0600 comment in kernel print to maximize tp commit c982bb5 Merge: 94f333a d8ba447 Author: Xueshen Liu <liuxs@umich.edu> Date: Wed Feb 28 11:14:20 2024 -0600 Merge branch 'gpu_kernel-break' of github.com:Minimap2onGPU/minimap2 into gpu_kernel-break commit 94f333a Author: Xueshen Liu <liuxs@umich.edu> Date: Wed Feb 28 11:14:16 2024 -0600 update plscore commit d8ba447 Author: joydddd <joydong@umich.edu> Date: Mon Feb 26 12:40:55 2024 -0500 Remove skip backtracking in GPU implementation. Outputs are correct commit 6836cb7 Merge: 907748f 3239112 Author: Xueshen Liu <liuxs@umich.edu> Date: Fri Feb 23 19:13:51 2024 -0600 Merge branch 'gpu_kernel-break' of github.com:Minimap2onGPU/minimap2 into gpu_kernel-break commit 907748f Author: Xueshen Liu <liuxs@umich.edu> Date: Fri Feb 23 19:13:46 2024 -0600 add seg count commit 3239112 Author: joydddd <joydong@umich.edu> Date: Thu Feb 22 23:26:47 2024 -0500 Add put long segs back to original reads, but output seems to be wrong?? commit 474e746 Author: joydddd <joydong@umich.edu> Date: Thu Feb 22 22:37:27 2024 -0500 Temporal Fix microbacthing error (Use CPU kernel) commit 66e2e54 Author: Xueshen Liu <liuxs@umich.edu> Date: Thu Feb 22 11:15:52 2024 -0600 fix atomic add in long seg, only first thread in block add the atomic commit c40bf9f Author: Xueshen Liu <liuxs@umich.edu> Date: Sat Feb 17 00:38:19 2024 -0600 fix atomicadd -> atomicsub, TODO: add more cudaCheck commit 9678399 Author: Joy Dong <joydong@umich.edu> Date: Thu Feb 15 23:45:36 2024 -0600 Fix throughput analysis commit 39f758b Author: Joy Dong <joydong@umich.edu> Date: Thu Feb 15 22:50:52 2024 -0500 Edit throughput calculation. JIT Compilat error on cuda, push to try on HIP commit b21c5f9 Author: Joy Dong <joydong@umich.edu> Date: Tue Feb 13 15:46:20 2024 -0500 Update debug analysis commit cff2e27 Author: Xueshen Liu <liuxs@umich.edu> Date: Fri Feb 9 10:12:52 2024 -0600 debug info control commit accea33 Merge: d9e7396 e1248fb Author: Xueshen Liu <liuxs@umich.edu> Date: Fri Feb 9 09:44:56 2024 -0600 Merge branch 'gpu_kernel-break' of github.com:Minimap2onGPU/minimap2 into gpu_kernel-break commit d9e7396 Author: Xueshen Liu <liuxs@umich.edu> Date: Fri Feb 9 09:44:52 2024 -0600 add atomic runtime balancing commit e1248fb Author: Joy Dong <joydong@umich.edu> Date: Thu Feb 8 20:01:02 2024 -0500 Update throughput calculation, a6000 config commit 4aeacfd Author: Xueshen Liu <liuxs@umich.edu> Date: Thu Feb 8 12:48:15 2024 -0600 add sorting technique commit 8ee89c1 Author: Xueshen Liu <liuxs@umich.edu> Date: Tue Feb 6 15:09:15 2024 -0600 update scripts commit 216f2b2 Author: Joy Dong <joydong@umich.edu> Date: Fri Jan 26 12:01:24 2024 -0600 Add kernel throughput calculatation commit 15d0003 Author: Joy Dong <joydong@umich.edu> Date: Wed Jan 10 21:11:34 2024 -0600 change script path commit aa62643 Author: Xueshen Liu <liuxs@umich.edu> Date: Wed Jan 10 14:37:39 2024 -0600 use hostmalloc to avoid step1 delay commit 85f1cbb Author: Xueshen Liu <liuxs@umich.edu> Date: Fri Nov 3 19:26:00 2023 -0500 finish microbatch design, TODO: add batch number to config, and use hostmalloc commit 1a585ab Author: Joy Dong <joydong@umich.edu> Date: Mon Oct 30 16:23:50 2023 -0500 Add acc_config. FIX seg fault for long_seg_count reset commit cb1a30e Author: Xueshen Liu <liuxs@umich.edu> Date: Fri Oct 27 13:56:12 2023 -0500 no reset long seg on each micro batch cause fault commit 5e1abe8 Author: Xueshen Liu <liuxs@umich.edu> Date: Fri Oct 27 11:34:49 2023 -0500 finish minibatch, parameter is still hardcoded, debug function need fix, f_long is not copied back commit 5598718 Author: Xueshen Liu <liuxs@umich.edu> Date: Thu Oct 26 09:46:48 2023 -0500 add omnitrace scripts commit 31dff82 Author: Joy Juechu Dong <joydong@umich.edu> Date: Thu Oct 19 23:01:03 2023 -0400 Add aggregate long segs (#12) commit cd225fa Author: Xueshen Liu <liuxs@umich.edu> Date: Thu Oct 12 09:49:48 2023 -0500 add new api translation, enable analysis print commit 247a17f Merge: 1d5c636 1e8781a Author: joydddd <joydong@umich.edu> Date: Tue Oct 10 22:00:31 2023 -0400 Merge commit '1e8781a' into gpu_kernel commit 1e8781a Author: joydddd <joydong@umich.edu> Date: Tue Oct 10 21:56:18 2023 -0400 Fix debug functions commit 1d5c636 Merge: d334c9e ccddf5c Author: joydddd <joydong@umich.edu> Date: Wed Sep 27 11:22:11 2023 -0400 Delete gpu/.depend commit d334c9e Author: Joy Juechu Dong <joydong@umich.edu> Date: Tue Sep 26 15:30:12 2023 -0400 merge with aac code including CUDA timer (#9) commit ccddf5c Author: Xueshen Liu <liuxs@umich.edu> Date: Tue Sep 26 16:55:27 2023 -0500 add sample slurm commit 7b65403 Merge: cabe321 6897137 Author: Xueshen Liu <liuxs@umich.edu> Date: Tue Sep 26 16:41:38 2023 -0500 Merge branch 'gpu_kernel' of github.com:Minimap2onGPU/minimap2 into gpu_kernel commit cabe321 Author: Xueshen Liu <liuxs@umich.edu> Date: Tue Sep 26 16:39:58 2023 -0500 add profiling scripts, please check scripts/*.slurm commit 6897137 Author: Joy Juechu Dong <joydong@umich.edu> Date: Tue Sep 26 15:30:12 2023 -0400 merge with aac code including CUDA timer (#9) commit e144d2d Merge: 4044b9f 1d5d7ab Author: Xueshen Liu <liuxs@umich.edu> Date: Tue Sep 5 19:50:37 2023 -0400 merge with aac code including CUDA timer commit 4044b9f Author: Xueshen Liu <liuxs@umich.edu> Date: Tue Sep 5 19:47:31 2023 -0400 remove const keyword commit 1d5d7ab Author: Xueshen Liu <liuxs@umich.edu> Date: Tue Sep 5 17:52:39 2023 -0500 add timer by event recorder commit 10e7197 Author: Xueshen Liu <liuxs@umich.edu> Date: Wed Jul 12 10:49:27 2023 -0400 update parseing profile scripts commit ae3ecd3 Author: Xueshen Liu <liuxs@umich.edu> Date: Wed Jul 12 10:46:53 2023 -0400 add profile scripts commit cff2019 Author: Xueshen Liu <liuxs@umich.edu> Date: Wed Jul 12 10:46:41 2023 -0400 add profile scripts commit 26a4044 Author: Xueshen Liu <liuxs@umich.edu> Date: Wed Jul 12 10:45:37 2023 -0400 add profile scripts commit ced508a Author: Xueshen Liu <liuxs@umich.edu> Date: Tue Jul 11 22:47:19 2023 -0400 add cudacheck and uncomment syncthreads commit a01dd39 Merge: ea9c438 10a5a8b Author: Xueshen Liu <liuxs@umich.edu> Date: Tue Jul 11 16:03:45 2023 -0400 merge with independent mid kernel commit ea9c438 Author: Xueshen Liu <liuxs@umich.edu> Date: Tue Jul 11 15:19:51 2023 -0400 add global config commit 10a5a8b Merge: 7018ece 18bc4ac Author: joydddd <joydong@umich.edu> Date: Wed Jul 5 11:06:17 2023 -0400 Merge branch 'gpu_kernel' of github.com:Minimap2onGPU/minimap2 into gpu_kernel commit 7018ece Author: joydddd <joydong@umich.edu> Date: Wed Jul 5 10:58:35 2023 -0400 Add anchor compression commit 18bc4ac Author: Xueshen Liu <liuxs@umich.edu> Date: Wed Jun 28 19:21:06 2023 -0400 merge makefile with nvcc version commit 5a12d5b Author: Xueshen Liu <liuxs@umich.edu> Date: Wed Jun 28 19:19:25 2023 -0400 add template to mid kernel launch commit 4318078 Author: joydddd <joydong@umich.edu> Date: Wed Jun 28 15:28:53 2023 -0400 Add num segs printout commit 5906e29 Merge: 9149e17 e6f016c Author: joydddd <joydong@umich.edu> Date: Wed Jun 28 13:17:41 2023 -0400 Merge branch 'gpu_kernel' of github.com:Minimap2onGPU/minimap2 into gpu_kernel commit 9149e17 Author: joydddd <joydong@umich.edu> Date: Wed Jun 28 13:11:40 2023 -0400 Add short-mid-long kernel commit e6f016c Author: Xueshen Liu <liuxs@umich.edu> Date: Sun Jun 25 12:21:50 2023 -0400 add compile time launch bound using template commit c601518 Author: Xueshen Liu <liuxs@umich.edu> Date: Sun Jun 25 12:15:49 2023 -0400 add compile time launch bound commit be01974 Author: Xueshen Liu <liuxs@umich.edu> Date: Sat Jun 10 16:16:23 2023 -0400 disable new offset feature commit 2d46499 Author: Xueshen Liu <liuxs@umich.edu> Date: Sat Jun 10 16:05:53 2023 -0400 merge with cuda * FIXME: Warp Illegal Address at __shfl_sync() * FIXME: seg start_idx end_idx out of range * Fix cuda score generation kernel * Fix merge error
Still cause page fault when not resetting long seg count every batch