Python binding for web-rwkv
.
- Basic V5 inference support
- Support V4, V5 and V6
- Batched inference
-
Install python and rust.
-
Install maturin by
$ pip install maturin
-
Build and install:
$ maturin develop --release
-
Try using
web-rwkv
in python:import web_rwkv_py as wrp model = wrp.Model( "/path/to/model.st", # model path quant=0, # int8 quantization layers quant_nf4=0, # nf4 quantization layers ) model.clear_state() logits = model.run([114, 514])
-
Get, clone and load current state:
logits = model.run([114, 514]) state = model.back_state(wrp.StateDevice.Gpu) # state = model.back_state(wrp.StateDevice.Cpu) state_cloned = state.deep_clone() model.load_state(state_cloned) logits = model.run([1919, 810])
-
Return predictions of all tokens (not only the last's):
logits, state = model.run_full([114, 514, 1919, 810], state=None) assert(len(logits) == 4)