- Board: any of the following
- Ultra96
- Ultra96-V2
- OS: any of the following
Distributed under the BSD 2-Clause License.
Fig.1 ArgSort-Ultra96 Design Block
Table.2 Utilization
Design | Resources | Freq | ||||||
Name | MRG WAYS |
MRG WORDS |
STM FB |
CLB LUTs |
CLB Register |
BLOCK RAM |
DSPs | Freq [MHz] |
argsort_16_1_0 | 16 | 1 | 0 | 42142 | 27011 | 38 | 0 | 250 |
argsort_16_1_1 | 16 | 1 | 1 | 41865 | 27249 | 38 | 0 | 250 |
argsort_16_1_2 | 16 | 1 | 2 | 41799 | 26261 | 54 | 0 | 250 |
argsort_16_2_0 | 16 | 2 | 0 | 59246 | 55456 | 38 | 0 | 250 |
argsort_16_2_1 | 16 | 2 | 1 | 60828 | 57063 | 38 | 0 | 250 |
argsort_16_2_2 | 16 | 2 | 2 | 58819 | 55210 | 70 | 0 | 250 |
argsort_32_1_0 | 32 | 1 | 0 | 64126 | 45025 | 70 | 15 | 250 |
argsort_32_1_1 | 32 | 1 | 1 | 66621 | 46356 | 70 | 15 | 250 |
argsort_32_1_2 | 32 | 1 | 2 | 64988 | 44866 | 198 | 15 | 250 |
argsort_32_2_0 | 32 | 2 | 0 | resource over | ||||
zcu3egsbva48-1 resouce available | 70560 | 141120 | 216 | 360 |
Fig.2 Utlization(LUTs %)
Table.3 Performance
Design | Sort time [msec] | Throughput Average [Mwords/sec] |
|||||
Name | MRG WAYS |
MRG WORDS |
STM FB |
10K [words] |
100K [words] |
1M [words] |
|
argsort_16_1_0 | 16 | 1 | 0 | 0.569 | 4.766 | 54.456 | 18.75 |
argsort_16_1_1 | 16 | 1 | 1 | 0.400 | 3.000 | 34.355 | 29.51 |
argsort_16_1_2 | 16 | 1 | 2 | 0.384 | 2.674 | 27.644 | 36.04 |
argsort_16_2_0 | 16 | 2 | 0 | 0.436 | 3.219 | 42.970 | 24.13 |
argsort_16_2_1 | 16 | 2 | 1 | 0.325 | 2.047 | 31.311 | 33.89 |
argsort_16_2_2 | 16 | 2 | 2 | 0.328 | 1.802 | 26.314 | 40.01 |
argsort_32_1_0 | 32 | 1 | 0 | 0.422 | 3.384 | 39.381 | 25.69 |
argsort_32_1_1 | 32 | 1 | 1 | 0.341 | 2.494 | 27.748 | 36.31 |
argsort_32_1_2 | 32 | 1 | 2 | 0.595 | 2.711 | 27.433 | 36.79 |
ZynqMP(arm64) numpy.argsort() | 1.790 | 32.036 | 1320.921 | 1.49 |
Fig.3 Throughput Average [Mwords/sec]
See https://github.com/ikwzm/ZynqMP-FPGA-Linux or https://github.com/ikwzm/ZynqMP-FPGA-Ubuntu18.04-Ultra96
fpga@debian-fpga:~/$ git clone --branch 1.2.0 git://github.com/ikwzm/ArgSort-Ultra96.git
fpga@debian-fpga:~/$ cd ArgSort-Ultra96
fpga@debian-fpga:~/ArgSort-Ultra96$ sudo TARGET=argsort_16_2_2 rake install
gzip -d -f -c argsort_16_2_2.bin.gz > /lib/firmware/argsort_16_2_2.bin
./dtbocfg.rb --install argsort --dts argsort_16_2_2_5.4.dts
/tmp/dtovly20201118-1281-1tf8e0q: Warning (unit_address_vs_reg): /fragment@2/__overlay__/uio_argsort: node has a reg or ranges property, but no unit name
/tmp/dtovly20201118-1281-1tf8e0q: Warning (avoid_unnecessary_addr_size): /fragment@2: unnecessary #address-cells/#size-cells without "ranges" or child "reg" property
[10952.701089] fpga_manager fpga0: writing argsort_16_2_2.bin to Xilinx ZynqMP FPGA Manager
[10952.861395] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /fpga-full/firmware-name
[10952.874409] fclkcfg amba_pl@0:fclk0: driver version : 1.7.1
[10952.879998] fclkcfg amba_pl@0:fclk0: device name : amba_pl@0:fclk0
[10952.886447] fclkcfg amba_pl@0:fclk0: clock name : pl0_ref
[10952.892194] fclkcfg amba_pl@0:fclk0: clock rate : 249999998
[10952.898138] fclkcfg amba_pl@0:fclk0: clock enabled : 1
[10952.903363] fclkcfg amba_pl@0:fclk0: remove rate : 1000000
[10952.909107] fclkcfg amba_pl@0:fclk0: remove enable : 0
[10952.914327] fclkcfg amba_pl@0:fclk0: driver installed.
[10952.935858] u-dma-buf udmabuf-argsort-in: driver version = 3.2.0
[10952.941868] u-dma-buf udmabuf-argsort-in: major number = 241
[10952.947704] u-dma-buf udmabuf-argsort-in: minor number = 0
[10952.953360] u-dma-buf udmabuf-argsort-in: phys address = 0x0000000070400000
[10952.960498] u-dma-buf udmabuf-argsort-in: buffer size = 33554432
[10952.966762] u-dma-buf amba_pl@0:udmabuf_argsort_in: driver installed.
[10952.988678] u-dma-buf udmabuf-argsort-out: driver version = 3.2.0
[10952.994773] u-dma-buf udmabuf-argsort-out: major number = 241
[10953.000697] u-dma-buf udmabuf-argsort-out: minor number = 1
[10953.006438] u-dma-buf udmabuf-argsort-out: phys address = 0x0000000072400000
[10953.013662] u-dma-buf udmabuf-argsort-out: buffer size = 33554432
[10953.020014] u-dma-buf amba_pl@0:udmabuf_argsort_out: driver installed.
[10953.085033] u-dma-buf udmabuf-argsort-tmp: driver version = 3.2.0
[10953.091130] u-dma-buf udmabuf-argsort-tmp: major number = 241
[10953.097060] u-dma-buf udmabuf-argsort-tmp: minor number = 2
[10953.102804] u-dma-buf udmabuf-argsort-tmp: phys address = 0x0000000074400000
[10953.110028] u-dma-buf udmabuf-argsort-tmp: buffer size = 134217728
[10953.116466] u-dma-buf amba_pl@0:udmabuf_argsort_tmp: driver installed.
fpga@debian-fpga:~/ArgSort-Ultra96$ rake sample_0001000000.npy
python3 generate_sample.py --size 1000000 --sample sample_0001000000.npy
generate_sample: sample_file : sample_0001000000.npy
generate_sample: size : 1000000
generate_sample: time : 44.262 [msec]
fpga@debian-fpga:~/ArgSort-Ultra96$ rake expect_0001000000.npy
python3 generate_expect.py --sample sample_0001000000.npy --expect expect_0001000000.npy --log expect.log
generate_expect: sample_file : sample_0001000000.npy
generate_expect: expect_file : expect_0001000000.npy
generate_expect: size : 1000000
generate_expect: average_time : 1325.425 # [msec]
generate_expect: throughput : 0.754 # [mwords/sec]
fpga@debian-fpga:~/ArgSort-Ultra96$ rake test_1000000
python3 argsort_test.py --sample sample_0001000000.npy --result result_0001000000.npy -n 10 -d 2 --log argsort_16_2_2.log
argsort_test : Version : 1.2
argsort_test : Ways : 16
argsort_test : Words : 2
argsort_test : Feedback : 2
argsort_test : WordBits : 32
argsort_test : IndexBits : 32
argsort_test : Sort Order : 0
argsort_test : Sign Compare : 0
argsort_test : Max Size : 268435455
argsort_test : Debug Enable : 1
argsort_test : sample_file : sample_0001000000.npy
argsort_test : size : 1000000
argsort_test : debug_mode : 2
argsort_test : loops : 10
argsort_test : time : 26.149 # [msec]
argsort_test : time : 26.515 # [msec]
argsort_test : time : 26.186 # [msec]
argsort_test : time : 26.037 # [msec]
argsort_test : time : 26.611 # [msec]
argsort_test : time : 26.458 # [msec]
argsort_test : time : 25.907 # [msec]
argsort_test : time : 26.645 # [msec]
argsort_test : time : 26.656 # [msec]
argsort_test : time : 25.674 # [msec]
argsort_test : result_file : result_0001000000.npy
argsort_test : average_time : 26.284 # [msec]
argsort_test : throughput : 38.046 # [mwords/sec]
argsort_test : Debug_Time(0): 25.199 # [msec]
argsort_test : Debug_Time(1): 16.501 # [msec]
argsort_test : Debug_Time(2): 4.860 # [msec]
argsort_test : Debug_Time(3): 3.838 # [msec]
python3 check_result.py --sample sample_0001000000.npy --result result_0001000000.npy --expect expect_0001000000.npy
check_result: sample file : sample_0001000000.npy
check_result: expect file : expect_0001000000.npy
check_result: result file : result_0001000000.npy
check_result: OK
fpga@debian-fpga:~/ArgSort-Ultra96$ sudo rake uninstall
./dtbocfg.rb --remove argsort
[11218.745653] u-dma-buf amba_pl@0:udmabuf_argsort_tmp: driver removed.
[11218.757907] u-dma-buf amba_pl@0:udmabuf_argsort_out: driver removed.
[11218.770021] u-dma-buf amba_pl@0:udmabuf_argsort_in: driver removed.
[11218.777459] fclkcfg amba_pl@0:fclk0: driver removed.
- Xilinx Vivado 2020.1
- Xilinx Vivado 2020.2
shell$ git clone --branch 1.2.0 git://github.com/ikwzm/ArgSort-Ultra96.git
shell$ cd ArgSort-Ultra96
shell$ git submodule update --init --recursive
Vivado > Tools > Run Tcl Script... > argsort_16_2_2/create_project.tcl
Vivado > Tools > Run Tcl Script... > argsort_16_2_2/implementation.tcl
vivado% cd argsort_16_2_2
vivado% bootgen -image design_1.bif -arch zynqmp -w -o ../argsort_16_2_2.bin
vivado% cd ..
vivado% gzip argsort_16_2_2.bin
- GHDL 0.35 or later
shell$ git clone git://github.com/ikwzm/ArgSort-Ultra96.git
shell$ cd ArgSort-Ultra96
shell$ git submodule update --init --recursive
shell$ cd Merge_Sorter/Dummy_Plug/sim/ghdl-0.35/dummy_plug/
shell$ make
shell$ cd sim/ghdl
shell$ make dut
../../Merge_Sorter/PipeWork/tools/vhdl-archiver.rb \
--library MERGE_SORTER \
--archive merge_sorter.vhd \
../../ip/argsort_axi_0.8//src/MERGE_SORTER/
/mnt/d/ichiro/work/ArgSort-Ultra96/Merge_Sorter/PipeWork/tools/lib/pipework/vhdl-reader.rb:149: warning: constant ::FALSE is deprecated
/mnt/d/ichiro/work/ArgSort-Ultra96/Merge_Sorter/PipeWork/tools/lib/pipework/vhdl-reader.rb:159: warning: constant ::TRUE is deprecated
/mnt/d/ichiro/work/ArgSort-Ultra96/Merge_Sorter/PipeWork/tools/lib/pipework/vhdl-reader.rb:155: warning: constant ::TRUE is deprecated
../../Merge_Sorter/PipeWork/tools/vhdl-archiver.rb \
--library PIPEWORK \
--use_entity 'SDPRAM(MODEL)' \
--archive pipework.vhd \
../../ip/argsort_axi_0.8//src/PIPEWORK/
/mnt/d/ichiro/work/ArgSort-Ultra96/Merge_Sorter/PipeWork/tools/lib/pipework/vhdl-reader.rb:149: warning: constant ::FALSE is deprecated
/mnt/d/ichiro/work/ArgSort-Ultra96/Merge_Sorter/PipeWork/tools/lib/pipework/vhdl-reader.rb:159: warning: constant ::TRUE is deprecated
/mnt/d/ichiro/work/ArgSort-Ultra96/Merge_Sorter/PipeWork/tools/lib/pipework/vhdl-reader.rb:155: warning: constant ::TRUE is deprecated
ghdl -a --mb-comments -P../../Merge_Sorter/Dummy_Plug/sim/ghdl-0.35/dummy_plug -P./ --work=PIPEWORK pipework.vhd
pipework.vhd:11179:23:warning: declaration of "data_width" hides constant "data_width" [-Whide]
pipework.vhd:13247:18:warning: declaration of "req_queue_empty" hides signal "req_queue_empty" [-Whide]
pipework.vhd:17776:18:warning: declaration of "size" hides process labeled "size" [-Whide]
pipework.vhd:17820:16:warning: declaration of "xfer_last" hides port "xfer_last" [-Whide]
pipework.vhd:27457:15:warning: declaration of "queue_tree_arbiter" hides entity "queue_tree_arbiter" [-Whide]
pipework.vhd:27540:13:warning: declaration of "arb" hides component instance "arb" [-Whide]
pipework.vhd:28081:18:warning: declaration of "i_val" hides port "i_val" [-Whide]
pipework.vhd:28163:18:warning: declaration of "i_val" hides port "i_val" [-Whide]
ghdl -a --mb-comments -P../../Merge_Sorter/Dummy_Plug/sim/ghdl-0.35/dummy_plug -P./ --work=MERGE_SORTER merge_sorter.vhd
merge_sorter.vhd:6977:23:warning: declaration of "outlet_last" hides port "outlet_last" [-Whide]
merge_sorter.vhd:7532:19:warning: declaration of "o_word" hides port "o_word" [-Whide]
merge_sorter.vhd:10846:9:warning: declaration of "req" hides block statement labeled "req" [-Whide]
merge_sorter.vhd:10851:23:warning: declaration of "req_last" hides port "req_last" [-Whide]
merge_sorter.vhd:12229:23:warning: declaration of "state_type" hides type "state_type" [-Whide]
merge_sorter.vhd:12230:23:warning: declaration of "curr_state" hides signal "curr_state" [-Whide]
merge_sorter.vhd:12480:16:warning: declaration of "merge_sorter_tree" hides entity "merge_sorter_tree" [-Whide]
merge_sorter.vhd:13452:32:warning: declaration of "a_word" hides port "a_word" [-Whide]
merge_sorter.vhd:13452:40:warning: declaration of "b_word" hides port "b_word" [-Whide]
merge_sorter.vhd:13467:30:warning: declaration of "a_word" hides port "a_word" [-Whide]
merge_sorter.vhd:13467:38:warning: declaration of "b_word" hides port "b_word" [-Whide]
shell$ cd sim/ghdl
shell$ make
ghdl -a --mb-comments -P../../Merge_Sorter/Dummy_Plug/sim/ghdl-0.35/dummy_plug -P./ --work=MERGE_SORTER ../../Merge_Sorter/src/test/vhdl/argsort_axi_test_bench.vhd
ghdl -e --mb-comments -P../../Merge_Sorter/Dummy_Plug/sim/ghdl-0.35/dummy_plug -P./ --work=MERGE_SORTER ArgSort_AXI_Test_Bench_X16_W1_F2
ghdl -r --mb-comments -P../../Merge_Sorter/Dummy_Plug/sim/ghdl-0.35/dummy_plug -P./ --work=MERGE_SORTER ArgSort_AXI_Test_Bench_X16_W1_F2
35 ns| MARCHAL < ArgSort_AXI_Test TEST 1 Start.
55 ns| MARCHAL < ArgSort_AXI_Test TEST 1.1 Start.
1705 ns| MARCHAL < ArgSort_AXI_Test TEST 1.1 Done.
:
:
:
3243505 ns| MARCHAL < ArgSort_AXI_Test TEST 3.20 SIZE=479 Start.
3370415 ns| MARCHAL < ArgSort_AXI_Test TEST 3.20 SIZE=479 Done.
3370435 ns| MARCHAL < ArgSort_AXI_Test TEST 3 Done.
***
*** ERROR REPORT TEST_X16_W1_F2
***
*** [ CSR ]
*** Error : 0
*** Mismatch : 0
*** Warning : 0
***
*** [ STM AXI]
*** Error : 0
*** Mismatch : 0
*** Warning : 0
***
*** [ MRG AXI]
*** Error : 0
*** Mismatch : 0
*** Warning : 0
***
../../Merge_Sorter/src/test/vhdl/argsort_axi_test_bench.vhd:875:13:@3370456ns:(assertion note): Simulation complete(success).
- Xilinx Vivado 2019.2
shell$ git clone git://github.com/ikwzm/ArgSort-Ultra96.git
shell$ cd ArgSort-Ultra96
shell$ git submodule update --init --recursive
Vivado > Tools > Run Tcl Script... > sim/vivado/create_project.tcl
Vivado > File > Project > Open... > sim/vivado/argsort_axi.xpr
Vivado > Flow Navigator > Run Simulation > Run Behavioral Simulation