@@ -287,3 +287,181 @@ Prompt: 'The president of the United States is', Generated text: ' a very import
287287Prompt: ' The capital of France is' , Generated text: ' Paris. The oldest part of the city is Saint-Germain-des-Pr'
288288Prompt: ' The future of AI is' , Generated text: ' not bright\n\nThere is no doubt that the evolution of AI will have a huge'
289289```
290+
291+ ## Multi-node Deployment
292+ ### Verify Multi-Node Communication Environment
293+
294+ #### Physical Layer Requirements:
295+
296+ - The physical machines must be located on the same WLAN, with network connectivity.
297+ - All NPUs are connected with optical modules, and the connection status must be normal.
298+
299+ #### Verification Process:
300+
301+ Execute the following commands on each node in sequence. The results must all be ` success ` and the status must be ` UP ` :
302+
303+ :::::{tab-set}
304+ ::::{tab-item} A2 series
305+
306+ ``` bash
307+ # Check the remote switch ports
308+ for i in {0..7}; do hccn_tool -i $i -lldp -g | grep Ifname; done
309+ # Get the link status of the Ethernet ports (UP or DOWN)
310+ for i in {0..7}; do hccn_tool -i $i -link -g ; done
311+ # Check the network health status
312+ for i in {0..7}; do hccn_tool -i $i -net_health -g ; done
313+ # View the network detected IP configuration
314+ for i in {0..7}; do hccn_tool -i $i -netdetect -g ; done
315+ # View gateway configuration
316+ for i in {0..7}; do hccn_tool -i $i -gateway -g ; done
317+ # View NPU network configuration
318+ cat /etc/hccn.conf
319+ ```
320+
321+ ::::
322+ ::::{tab-item} A3 series
323+
324+ ``` bash
325+ # Check the remote switch ports
326+ for i in {0..15}; do hccn_tool -i $i -lldp -g | grep Ifname; done
327+ # Get the link status of the Ethernet ports (UP or DOWN)
328+ for i in {0..15}; do hccn_tool -i $i -link -g ; done
329+ # Check the network health status
330+ for i in {0..15}; do hccn_tool -i $i -net_health -g ; done
331+ # View the network detected IP configuration
332+ for i in {0..15}; do hccn_tool -i $i -netdetect -g ; done
333+ # View gateway configuration
334+ for i in {0..15}; do hccn_tool -i $i -gateway -g ; done
335+ # View NPU network configuration
336+ cat /etc/hccn.conf
337+ ```
338+
339+ ::::
340+ :::::
341+
342+ #### NPU Interconnect Verification:
343+ ##### 1. Get NPU IP Addresses
344+ :::::{tab-set}
345+ ::::{tab-item} A2 series
346+
347+ ``` bash
348+ for i in {0..7}; do hccn_tool -i $i -ip -g | grep ipaddr; done
349+ ```
350+
351+ ::::
352+ ::::{tab-item} A3 series
353+
354+ ``` bash
355+ for i in {0..15}; do hccn_tool -i $i -ip -g | grep ipaddr; done
356+ ```
357+
358+ ::::
359+ :::::
360+
361+ ##### 2. Cross-Node PING Test
362+
363+ ``` bash
364+ # Execute on the target node (replace with actual IP)
365+ hccn_tool -i 0 -ping -g address x.x.x.x
366+ ```
367+
368+ ### Run Container In Each Node
369+
370+ Using vLLM-ascend official container is more efficient to run multi-node environment.
371+
372+ Run the following command to start the container in each node (You should download the weight to /root/.cache in advance):
373+
374+ :::::{tab-set}
375+ ::::{tab-item} A2 series
376+
377+ ``` {code-block} bash
378+ :substitutions:
379+ # Update the vllm-ascend image
380+ # openEuler:
381+ # export IMAGE=quay.io/ascend/vllm-ascend:|vllm_ascend_version|-openeuler
382+ # Ubuntu:
383+ # export IMAGE=quay.io/ascend/vllm-ascend:|vllm_ascend_version|
384+ export IMAGE=quay.io/ascend/vllm-ascend:|vllm_ascend_version|
385+
386+ # Run the container using the defined variables
387+ # Note if you are running bridge network with docker, Please expose available ports
388+ # for multiple nodes communication in advance
389+ docker run --rm \
390+ --name vllm-ascend \
391+ --net=host \
392+ --shm-size=1g \
393+ --device /dev/davinci0 \
394+ --device /dev/davinci1 \
395+ --device /dev/davinci2 \
396+ --device /dev/davinci3 \
397+ --device /dev/davinci4 \
398+ --device /dev/davinci5 \
399+ --device /dev/davinci6 \
400+ --device /dev/davinci7 \
401+ --device /dev/davinci_manager \
402+ --device /dev/devmm_svm \
403+ --device /dev/hisi_hdc \
404+ -v /usr/local/dcmi:/usr/local/dcmi \
405+ -v /usr/local/Ascend/driver/tools/hccn_tool:/usr/local/Ascend/driver/tools/hccn_tool \
406+ -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
407+ -v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
408+ -v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
409+ -v /etc/ascend_install.info:/etc/ascend_install.info \
410+ -v /root/.cache:/root/.cache \
411+ -it $IMAGE bash
412+ ```
413+
414+ ::::
415+ ::::{tab-item} A3 series
416+
417+ ``` {code-block} bash
418+ :substitutions:
419+ # Update the vllm-ascend image
420+ # openEuler:
421+ # export IMAGE=quay.io/ascend/vllm-ascend:|vllm_ascend_version|-a3-openeuler
422+ # Ubuntu:
423+ # export IMAGE=quay.io/ascend/vllm-ascend:|vllm_ascend_version|-a3
424+ export IMAGE=quay.io/ascend/vllm-ascend:|vllm_ascend_version|-a3
425+
426+ # Run the container using the defined variables
427+ # Note if you are running bridge network with docker, Please expose available ports
428+ # for multiple nodes communication in advance
429+ docker run --rm \
430+ --name vllm-ascend \
431+ --net=host \
432+ --shm-size=1g \
433+ --device /dev/davinci0 \
434+ --device /dev/davinci1 \
435+ --device /dev/davinci2 \
436+ --device /dev/davinci3 \
437+ --device /dev/davinci4 \
438+ --device /dev/davinci5 \
439+ --device /dev/davinci6 \
440+ --device /dev/davinci7 \
441+ --device /dev/davinci8 \
442+ --device /dev/davinci9 \
443+ --device /dev/davinci10 \
444+ --device /dev/davinci11 \
445+ --device /dev/davinci12 \
446+ --device /dev/davinci13 \
447+ --device /dev/davinci14 \
448+ --device /dev/davinci15 \
449+ --device /dev/davinci_manager \
450+ --device /dev/devmm_svm \
451+ --device /dev/hisi_hdc \
452+ -v /usr/local/dcmi:/usr/local/dcmi \
453+ -v /usr/local/Ascend/driver/tools/hccn_tool:/usr/local/Ascend/driver/tools/hccn_tool \
454+ -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
455+ -v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
456+ -v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
457+ -v /etc/ascend_install.info:/etc/ascend_install.info \
458+ -v /root/.cache:/root/.cache \
459+ -it $IMAGE bash
460+ ```
461+
462+ ::::
463+ :::::
464+
465+ ### Verify installation
466+
467+ TODO
0 commit comments