Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use this branch to profile #21

Open
wants to merge 97 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
97 commits
Select commit Hold shift + click to select a range
737646d
Restructure minimap2 code to split seed, chain, and align steps to en…
gsitaram Nov 30, 2022
cbb507f
add gpu code
joydddd Mar 2, 2023
55f52e8
include cJson
xenshinu Mar 3, 2023
0d1e986
Add gpu kernel and memory pool for each batch
joydddd Mar 16, 2023
435c371
Add k memory pool leakage check, TODO: init_stream_gpu() called more …
joydddd Mar 16, 2023
c910386
Fix gpu initialization
joydddd Mar 16, 2023
b613c6c
gpu kernel running, memory pool no leakage. Error output?
joydddd Mar 16, 2023
51f5c12
Clean debug prints
joydddd Mar 16, 2023
21d794e
Fix Pending batch issue
joydddd Mar 17, 2023
4b05966
Merge branch 'gpu_kernel' of https://github.com/Minimap2onGPU/minimap…
xenshinu Mar 22, 2023
18f7a15
Merge pull request #6 from Minimap2onGPU/gpu_kernel
joydddd Mar 22, 2023
57c6217
Merge pull request #5 from Minimap2onGPU/split_kernels
joydddd Mar 22, 2023
d33eb44
add gpu config path as an option
xenshinu Mar 29, 2023
8b958f2
Fix long-short kernel; add chaining timer
joydddd Mar 29, 2023
7a0556d
Seperate gpu initialization and free
joydddd Mar 29, 2023
465158d
Merge branch 'develop' into gpu_kernel
xenshinu Apr 12, 2023
c00894c
Merge branch 'gpu_kernel' of https://github.com/Minimap2onGPU/minimap…
xenshinu Apr 12, 2023
bc31c6e
cuda flags
xenshinu May 12, 2023
5393081
Change sc grid calculation
joydddd May 31, 2023
c191ce4
add long seg
xenshinu May 31, 2023
229430d
merge grid size
xenshinu Jun 1, 2023
bf2b735
change plmem and plscore to use new config
xenshinu Jun 10, 2023
2d46499
merge with cuda
xenshinu Jun 10, 2023
be01974
disable new offset feature
xenshinu Jun 10, 2023
c601518
add compile time launch bound
xenshinu Jun 25, 2023
e6f016c
add compile time launch bound using template
xenshinu Jun 25, 2023
9149e17
Add short-mid-long kernel
joydddd Jun 28, 2023
5906e29
Merge branch 'gpu_kernel' of github.com:Minimap2onGPU/minimap2 into g…
joydddd Jun 28, 2023
4318078
Add num segs printout
joydddd Jun 28, 2023
5a12d5b
add template to mid kernel launch
xenshinu Jun 28, 2023
18bc4ac
merge makefile with nvcc version
xenshinu Jun 28, 2023
7018ece
Add anchor compression
joydddd Jul 5, 2023
10a5a8b
Merge branch 'gpu_kernel' of github.com:Minimap2onGPU/minimap2 into g…
joydddd Jul 5, 2023
ea9c438
add global config
xenshinu Jul 11, 2023
a01dd39
merge with independent mid kernel
xenshinu Jul 11, 2023
ced508a
add cudacheck and uncomment syncthreads
xenshinu Jul 12, 2023
26a4044
add profile scripts
xenshinu Jul 12, 2023
cff2019
add profile scripts
xenshinu Jul 12, 2023
ae3ecd3
add profile scripts
xenshinu Jul 12, 2023
10e7197
update parseing profile scripts
xenshinu Jul 12, 2023
1d5d7ab
add timer by event recorder
xenshinu Sep 5, 2023
4044b9f
remove const keyword
xenshinu Sep 5, 2023
e144d2d
merge with aac code including CUDA timer
xenshinu Sep 5, 2023
6897137
merge with aac code including CUDA timer (#9)
joydddd Sep 26, 2023
cabe321
add profiling scripts, please check scripts/*.slurm
xenshinu Sep 26, 2023
7b65403
Merge branch 'gpu_kernel' of github.com:Minimap2onGPU/minimap2 into g…
xenshinu Sep 26, 2023
ccddf5c
add sample slurm
xenshinu Sep 26, 2023
d334c9e
merge with aac code including CUDA timer (#9)
joydddd Sep 26, 2023
1d5c636
Delete gpu/.depend
joydddd Sep 27, 2023
1e8781a
Fix debug functions
joydddd Oct 11, 2023
247a17f
Merge commit '1e8781a' into gpu_kernel
joydddd Oct 11, 2023
cd225fa
add new api translation, enable analysis print
xenshinu Oct 12, 2023
31dff82
Add aggregate long segs (#12)
joydddd Oct 20, 2023
5598718
add omnitrace scripts
xenshinu Oct 26, 2023
5e1abe8
finish minibatch, parameter is still hardcoded, debug function need f…
xenshinu Oct 27, 2023
cb1a30e
no reset long seg on each micro batch cause fault
xenshinu Oct 27, 2023
1a585ab
Add acc_config. FIX seg fault for long_seg_count reset
joydddd Oct 30, 2023
85f1cbb
finish microbatch design, TODO: add batch number to config, and use h…
xenshinu Nov 4, 2023
aa62643
use hostmalloc to avoid step1 delay
xenshinu Jan 10, 2024
15d0003
change script path
joydddd Jan 11, 2024
216f2b2
Add kernel throughput calculatation
joydddd Jan 26, 2024
8ee89c1
update scripts
xenshinu Feb 6, 2024
4aeacfd
add sorting technique
xenshinu Feb 8, 2024
e1248fb
Update throughput calculation, a6000 config
joydddd Feb 9, 2024
d9e7396
add atomic runtime balancing
xenshinu Feb 9, 2024
accea33
Merge branch 'gpu_kernel-break' of github.com:Minimap2onGPU/minimap2 …
xenshinu Feb 9, 2024
cff2e27
debug info control
xenshinu Feb 9, 2024
b21c5f9
Update debug analysis
joydddd Feb 13, 2024
39f758b
Edit throughput calculation. JIT Compilat error on cuda, push to try …
joydddd Feb 16, 2024
9678399
Fix throughput analysis
joydddd Feb 16, 2024
c40bf9f
fix atomicadd -> atomicsub, TODO: add more cudaCheck
xenshinu Feb 17, 2024
66e2e54
fix atomic add in long seg, only first thread in block add the atomic
xenshinu Feb 22, 2024
474e746
Temporal Fix microbacthing error (Use CPU kernel)
joydddd Feb 23, 2024
3239112
Add put long segs back to original reads, but output seems to be wrong??
joydddd Feb 23, 2024
907748f
add seg count
xenshinu Feb 24, 2024
6836cb7
Merge branch 'gpu_kernel-break' of github.com:Minimap2onGPU/minimap2 …
xenshinu Feb 24, 2024
d8ba447
Remove skip backtracking in GPU implementation. Outputs are correct
joydddd Feb 26, 2024
94f333a
update plscore
xenshinu Feb 28, 2024
c982bb5
Merge branch 'gpu_kernel-break' of github.com:Minimap2onGPU/minimap2 …
xenshinu Feb 28, 2024
eed2640
comment in kernel print to maximize tp
xenshinu Mar 1, 2024
12c3fc0
config aac that maximize memory usage
xenshinu Mar 2, 2024
5c742d9
Add data analysis script
joydddd Mar 6, 2024
f7ecde1
Add data analysis script
joydddd Mar 6, 2024
424221d
use less branch compute sc
xenshinu Mar 7, 2024
13b2eab
Add range distribution analysis
joydddd Mar 14, 2024
4d2459c
Merge branch 'gpu_kernel-break' of github.com:Minimap2onGPU/minimap2 …
joydddd Mar 14, 2024
ad70b6a
cleanup gpu code for open source. TODO: Add README
joydddd Mar 14, 2024
3218873
Update print compile time config
joydddd Mar 14, 2024
ec8a340
Merge pull request #13 from Minimap2onGPU:gpu_kernel-break
joydddd Mar 14, 2024
41062d0
Clean up compile options. Move config to cmd option / gpu_config.json
joydddd Mar 18, 2024
fee935f
Merge branch 'gpu_kernel-break' into gpu_kernel
joydddd Mar 18, 2024
938a45f
Add planalyze.cu(h)
joydddd Mar 18, 2024
d9953f5
latest gpu config
xenshinu Mar 18, 2024
367dfdb
merge with mi210 code
xenshinu Mar 18, 2024
66c93c5
add configs
xenshinu Mar 19, 2024
29fd414
updatre slurm
xenshinu Mar 26, 2024
aa7dc42
update profile script please use scripts/acc_integrated.slurm to prof…
xenshinu Oct 6, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,17 @@
*.dSYM
minimap2
mappy.c
data
.vscode/**
test.sam
*.sam
Log/**
debug/**
verf
trace
ncu
nsys
*_output*
workloads
.cmake/**
.depend
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
[submodule "lib/simde"]
path = lib/simde
url = https://github.com/nemequ/simde.git
[submodule "cJSON"]
path = cJSON
url = https://github.com/DaveGamble/cJSON.git
1 change: 1 addition & 0 deletions LICENSE.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ The MIT License

Copyright (c) 2018- Dana-Farber Cancer Institute
2017-2018 Broad Institute, Inc.
2022 Advanced Micro Devices, Inc.

Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
Expand Down
55 changes: 42 additions & 13 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,12 +1,16 @@
CFLAGS= -g -Wall -O2 -Wc++-compat #-Wextra
CPPFLAGS= -DHAVE_KALLOC
INCLUDES=
CFLAGS= -O2 -g -DNDEBUG
CDEBUG_FLAGS= -g -O2 #-Wall -Wextra -Wno-unused-parameter -Wno-unused-variable -Wno-sign-compare -Wno-unused-function -Wno-c++17-extensions -Wno-\#warnings #-O0 -DNDEBUG
CPPFLAGS= -DHAVE_KALLOC -D__AMD_SPLIT_KERNELS__ # -Wno-unused-but-set-variable -Wno-unused-variable
CPPFLAGS+= $(if $(MAX_MICRO_BATCH),-DMAX_MICRO_BATCH=\($(MAX_MICRO_BATCH)\))
INCLUDES= -I .
OBJS= kthread.o kalloc.o misc.o bseq.o sketch.o sdust.o options.o index.o \
lchain.o align.o hit.o seed.o map.o format.o pe.o esterr.o splitidx.o \
ksw2_ll_sse.o
PROG= minimap2
# PROG= minimap2-zerobranch-debug
# PROG= minimap2-nobalance-debug
PROG= minimap2$(SUFFIX)
PROG_EXTRA= sdust minimap2-lite
LIBS= -lm -lz -lpthread
LIBS= -lm -lz -lpthread

ifeq ($(arm_neon),) # if arm_neon is not defined
ifeq ($(sse2only),) # if sse2only is not defined
Expand Down Expand Up @@ -34,7 +38,21 @@ ifneq ($(tsan),)
LIBS+=-fsanitize=thread
endif

.PHONY:all extra clean depend

# turn on debug flags
ifeq ($(DEBUG),info)
CFLAGS += -DDEBUG_PRINT
endif
ifeq ($(DEBUG), analyze)
CFLAGS += $(CDEBUG_FLAGS)
CFLAGS += -DDEBUG_CHECK -DDEBUG_PRINT
endif
ifeq ($(DEBUG), verbose)
CFLAGS += $(CDEBUG_FLAGS)
CFLAGS += -DDEBUG_CHECK -DDEBUG_PRINT -DDEBUG_VERBOSE
endif

.PHONY:all extra clean depend # profile
.SUFFIXES:.c .o

.c.o:
Expand All @@ -44,14 +62,25 @@ all:$(PROG)

extra:all $(PROG_EXTRA)

minimap2:main.o libminimap2.a
$(CC) $(CFLAGS) main.o -o $@ -L. -lminimap2 $(LIBS)
# build cJSON
CJSON_OBJ= cJSON/cJSON.o
INCLUDES += -I cJSON
$(CJSON_OBJ):
make -C cJSON

# build kernel objs
include gpu/gpu.mk


# compile with nvcc/hipcc
$(PROG):main.o libminimap2.a
$(GPU_CC) $(CFLAGS) $(GPU_FLAGS) main.o -o $@ -L. -lminimap2 $(LIBS)

minimap2-lite:example.o libminimap2.a
$(CC) $(CFLAGS) $< -o $@ -L. -lminimap2 $(LIBS)
$(GPU_CC) $(CFLAGS) $(GPU_FLAGS) $< -o $@ -L. -lminimap2 $(LIBS)

libminimap2.a:$(OBJS)
$(AR) -csru $@ $(OBJS)
libminimap2.a:$(OBJS) $(CU_OBJS) $(CJSON_OBJ)
$(AR) -csru $@ $^

sdust:sdust.c kalloc.o kalloc.h kdq.h kvec.h kseq.h ketopt.h sdust.h
$(CC) -D_SDUST_MAIN $(CFLAGS) $< kalloc.o -o $@ -lz
Expand Down Expand Up @@ -97,7 +126,7 @@ ksw2_exts2_neon.o:ksw2_exts2_sse.c ksw2.h kalloc.h

# other non-file targets

clean:
clean: cleangpu
rm -fr gmon.out *.o a.out $(PROG) $(PROG_EXTRA) *~ *.a *.dSYM build dist mappy*.so mappy.c python/mappy.c mappy.egg*

depend:
Expand Down Expand Up @@ -129,4 +158,4 @@ pe.o: mmpriv.h minimap.h bseq.h kseq.h kvec.h kalloc.h ksort.h
sdust.o: kalloc.h kdq.h kvec.h sdust.h
seed.o: mmpriv.h minimap.h bseq.h kseq.h kalloc.h ksort.h
sketch.o: kvec.h kalloc.h mmpriv.h minimap.h bseq.h kseq.h
splitidx.o: mmpriv.h minimap.h bseq.h kseq.h
splitidx.o: mmpriv.h minimap.h bseq.h kseq.h
27 changes: 27 additions & 0 deletions a6000_config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
{
"//config is for": "a6000. Fits one batch + 5% x 4 long buffer avg_read_n 10k",
"num_streams": 1,
"min_n": 512,
"//min_n": "queries with less anchors will be handled on cpu",
"long_seg_buffer_size": 258880000,
"max_total_n": 893440000,
"max_read": 893440,
"avg_read_n": 20000,
"//avg_read_n": "expect average number of anchors per read, not used if max_total_n and max_read are specified",
"range_kernel": {
"blockdim": 512,
"cut_check_anchors": 10,
"//cut_check_anchors": "Number of anchors to check to attemp a cut",
"anchor_per_block": 32768,
"//anchor_per_block": "Number of anchors each block handle. Must be int * blockdim"
},
"score_kernel": {
"short_blockdim": 64,
"long_blockdim": 64,
"mid_blockdim": 64,
"//blockdim config": "options are not used: static config specified at compile time (make ... LONG_BLOCK_SIZE=1024)",
"short_griddim": 2688,
"long_griddim": 1024,
"mid_griddim": 2688
}
}
27 changes: 27 additions & 0 deletions aac_config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
{
"//config is for": "aac cloud. Fits one batch + 5% x 4 long buffer avg_read_n 10k",
"num_streams": 1,
"min_n": 512,
"//min_n": "queries with less anchors will be handled on cpu",
"long_seg_buffer_size": 1117376000,
"max_total_n": 2036880000,
"max_read": 2036880,
"avg_read_n": 20000,
"//avg_read_n": "expect average number of anchors per read, not used if max_total_n and max_read are specified",
"range_kernel": {
"blockdim": 512,
"cut_check_anchors": 10,
"//cut_check_anchors": "Number of anchors to check to attemp a cut",
"anchor_per_block": 32768,
"//anchor_per_block": "Number of anchors each block handle. Must be int * blockdim"
},
"score_kernel": {
"short_blockdim": 64,
"long_blockdim": 64,
"mid_blockdim": 64,
"//blockdim config": "options are not used: static config specified at compile time (make ... LONG_BLOCK_SIZE=1024)",
"short_griddim": 16128,
"long_griddim": 208,
"mid_griddim": 16128
}
}
1 change: 1 addition & 0 deletions cJSON
Submodule cJSON added at b45f48
28 changes: 28 additions & 0 deletions gfx1030.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
{
"//config is for": "AMD Radeon RX 6800 XT on amdxfx. Fits one batch + 5% x 4 long buffer avg_read_n 10k",
"num_streams": 1,
"min_n": 512,
"//min_n": "queries with less anchors will be handled on cpu",
"long_seg_buffer_size": 100000000,
"max_total_n": 493440000,
"max_read": 493440,
"avg_read_n": 20000,
"//avg_read_n": "expect average number of anchors per read, not used if max_total_n and max_read are specified",
"range_kernel": {
"blockdim": 512,
"cut_check_anchors": 10,
"//cut_check_anchors": "Number of anchors to check to attemp a cut",
"anchor_per_block": 32768,
"//anchor_per_block": "Number of anchors each block handle. Must be int * blockdim"
},
"score_kernel": {
"micro_batch": 4,
"mid_blockdim": 512,
"//blockdim config": "options are not used: static config specified at compile time (make ... LONG_BLOCK_SIZE=1024)",
"short_griddim": 2688,
"long_griddim": 144,
"mid_griddim": 2688,
"long_seg_cutoff": 20,
"mid_seg_cutoff": 3
}
}
Loading