Skip to content

Latest commit

 

History

History
114 lines (75 loc) · 2.45 KB

s7.md

File metadata and controls

114 lines (75 loc) · 2.45 KB

S7.Run InternLM offline on Jetson

Create directory to save models.

mkdir -p ~/models

Upload the internlm-chat-7b-turbomind.tgz obtained from S1.Quantize on server by W4A16 to the models directory.

Unzip the model.

tar zxvf internlm-chat-7b-turbomind.tgz -C .

0.Bug fix: Modify the MMEngine module.

The PyTorch version on Jetson does not support distributed reduce operations, which may cause errors in the distributed parts of the MMEngine module.

Error as:

AttributeError: module 'torch.distributed' has no attribute 'ReduceOp'

Activate conda environment:

conda activate lmdeploy

Run Python in interpreter mode:

python

Enter the following content:

import mmengine
print(mmengine.__file__)

It will output the installation location of the MMEngine module. The author's location is/home/nvidia/miniconda3/envs/lmdeploy/lib/python3.8/site-packages/mmengine/__init__.py,then the location of that ishome/nvidia/miniconda3/envs/lmdeploy/lib/python3.8/site-packages/mmengine/.Let's use <path/to/mmengine> instead.

Modify line 208 of <path/to/mmengine>/logging/logger.py.

- global_rank = _get_rank()
+ global_rank = 0

There will be no errors during operation.

**Attention * *: This method is too crude and only applicable to Jetson platform deployment inference. It will affect distributed functionality on the server side!

1.CLI Mode

Acitavate conda environment:

conda activate lmdeploy

Run model.

lmdeploy chat turbomind ./internlm-chat-7b-turbomind

2.Python Mode

Write a running script run_model.py with the following content:

from lmdeploy import turbomind as tm


if __name__ == "__main__":
    model_path = "./internlm-chat-7b-turbomind" # 修改成你的路径

    tm_model = tm.TurboMind.from_pretrained(model_path)
    generator = tm_model.create_instance()

    while True:
        inp = input("[User] >>> ")
        if inp == "exit":
            break
        prompt = tm_model.model.get_prompt(inp)
        input_ids = tm_model.tokenizer.encode(prompt)
        for outputs in generator.stream_infer(session_id=0, input_ids=[input_ids]):
            res = outputs[1]
        response = tm_model.tokenizer.decode(res)
        print("[Bot] <<< {}".format(response))

Activate conda environment:

conda activate lmdeploy

Run the script:

python run_model.py