Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't place matrices into python files. Generate bundles. #182

Open
KOLANICH opened this issue Sep 16, 2022 · 2 comments
Open

Don't place matrices into python files. Generate bundles. #182

KOLANICH opened this issue Sep 16, 2022 · 2 comments

Comments

@KOLANICH
Copy link

A bundle is a directory with pickle-less .npy files and a python script __init__.py binding them into a model.

Feel free to reuse the approaches used in https://github.com/UniGrammar/UniGrammarRuntime.py/blob/master/UniGrammarRuntime/ParserBundle.py . It uses urm, which is a bit of overkill for your simple use case, so in your runtime you can just scan a dir for npy files and load them all into a dict.

So the following tiny prolog is needed:

from pathlib import Path
thisDir = Path(__file__).absolute().resolve().parent
weights = {el.stem: np.load(el, mmap_mode="r") for el in thisDir.glob("*.npy")}

Also we can place there ONNX file, and dispatch in runtime whether to use the plain numpy impl or an accelerated one.

Also the same npy files can be accessed, for example, from JS. Their format is damn simple and there are parsers.

JS code can be generated using https://github.com/ksons/jscodegen.py from ESPrima AST (just a structure of dicts).

But to be honest, there exists a project called TVM and IMHO the usual use case should be calling this project through TVM.

So it is proposed to do the following:

  1. implement a backend for TVM optimizing the models specifically for inference using plain numpy
  2. add a message that users should usually use TVM
@ishiy1993
Copy link
Contributor

Don't place matrices into python files.

Do you know the --export-tensor-size option in onnion?
If you want to separate matrices into npy files, use the option.

the usual use case should be calling this project through TVM.

I think that you should use TVM directly instead of onnion for the usual use case.

@KOLANICH
Copy link
Author

Thanks. Looked into it. I think that

  1. the tensor loading code inlined into the file should be replaced with the something like the generic proposed solution, which is more compact.
    with open(f"{graph_name}_tensors.npy", "wb") as f:
        for k in initializer:
            arr = initializer[k]
            if arr.size < export_tensor_size:
                init_code.append(f'    self.initializer["{k}"] = {embed_ndarray(arr)}')
            else:
                np.save(f, arr)  # type: ignore
init_code.append(f'    self.initializer["{k}"] = np.load(f)')

a. it unnecessarily opens .npy file in the case tensor gets inlined.
b. it generates code as strings, not as AST
c. is it possible to make an injection by supplying an ONNX file with a rogue name?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants