Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IR] Improve external data handling #2020

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
Open

Conversation

justinchuby
Copy link
Collaborator

@justinchuby justinchuby commented Jan 17, 2025

  1. Add an external_data option to ir.save. This will save all initializers as external tensors. It is robust against data loss when overwriting.
  2. Add an modify_model option to ir.save to allow users to control if they want to keep the model unchanged when saving to external data.
  3. Simplified torch_apis logic by leveraging to updated ir.save method.
  4. Updated the to_external_data function to always load data to memory, iff the tensor references an external data file that is being written to. This simplifies the logic and avoids creating and managing temporary files.

Note

We do not need to add external data options to ir.load. The external data is always loaded lazily in the IR. If we want to transfer the data to memory at loading, we can use the _external_tensor_to_memory_tensor internally.

Example usage

ir.save(model, "model.onnx", external_data="model.onnx.data")
# Can save many times
ir.save(model, "model_copy.onnx", external_data="model_copy.onnx.data")

TODO

  • More tests

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

onnxscript/ir/_external_data_test.py:352

  • The tobytes method should raise a TypeError instead of returning it.
return TypeError

onnxscript/ir/_io.py Show resolved Hide resolved
onnxscript/ir/_external_data.py Outdated Show resolved Hide resolved
onnxscript/ir/_external_data.py Outdated Show resolved Hide resolved
Copy link

codecov bot commented Jan 17, 2025

❌ 125 Tests Failed:

Tests completed Failed Passed Skipped
5848 125 5723 2719
View the top 2 failed tests by shortest run time
onnxscript.ir._io_test.TestIOFunctions::test_save_with_external_data_modify_model_false
Stack Traces | 0.002s run time
onnxscript/ir/_io_test.py:111: in test_save_with_external_data_modify_model_false
    node = ir.node("Identity", inputs=[tensor], outputs=["Y"])
E   AttributeError: module 'onnxscript.ir' has no attribute 'node'. Did you mean: 'Node'?
onnxscript.backend.onnx_export_test.TestOnnxBackEnd::test_export2python_produces_correct_onnx_script_model_1168_test_spacetodepth
Stack Traces | 0.003s run time
onnxscript\backend\onnx_export_test.py:137: in extract_functions
    mod = importlib.import_module(import_name)
C:\hostedtoolcache\windows\Python\3.11.9\x64\Lib\importlib\__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
E   ModuleNotFoundError: No module named 'tests.onnx_backend_test_code.test_spacetodepth'

The above exception was the direct cause of the following exception:
.nox\test_torch_nightly\Lib\site-packages\parameterized\parameterized.py:620: in standalone_func
    return func(*(a + p.args), **p.kwargs, **kw)
onnxscript\backend\onnx_export_test.py:271: in test_export2python_produces_correct_onnx_script_model
    functions = extract_functions(backend_test.name, code, self.test_folder)
onnxscript\backend\onnx_export_test.py:139: in extract_functions
    raise AssertionError(
E   AssertionError: Unable to import 'tests.onnx_backend_test_code.test_spacetodepth' (e=No module named 'tests.onnx_backend_test_code.test_spacetodepth') (file: 'D:\\a\\onnxscript\\onnxscript\\tests\\onnx_backend_test_code\\test_spacetodepth.py', absolute path: 'D:\\a\\onnxscript\\onnxscript\\tests\\onnx_backend_test_code\\test_spacetodepth.py', current folder: D:\a\onnxscript\onnxscript
E   ---- CONTENT --
E   import numpy
E   from onnx import TensorProto
E   from onnx.helper import make_tensor
E   from onnxscript import script, external_tensor
E   from onnxscript.values import Opset
E   from onnxscript.onnx_types import FLOAT
E   from onnxscript.onnx_opset import opset13
E   
E   @script()
E   def bck_test_spacetodepth(x: FLOAT[2,2,6,6]) -> (FLOAT[2,8,3,3]):
E       y = opset13.SpaceToDepth(x, blocksize=2)
E       return y
View the full list of 1 ❄️ flaky tests
onnxscript.backend.onnx_export_test.TestOnnxBackEnd::test_export2python_produces_correct_onnx_script_model_0034_test_and_bcast4v3d

Flake rate in main: 13.33% (Passed 13 times, Failed 2 times)

Stack Traces | 0.003s run time
onnxscript\backend\onnx_export_test.py:137: in extract_functions
    mod = importlib.import_module(import_name)
C:\hostedtoolcache\windows\Python\3.11.9\x64\Lib\importlib\__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
E   ModuleNotFoundError: No module named 'tests.onnx_backend_test_code.test_and_bcast4v3d'

The above exception was the direct cause of the following exception:
.nox\test\Lib\site-packages\parameterized\parameterized.py:620: in standalone_func
    return func(*(a + p.args), **p.kwargs, **kw)
onnxscript\backend\onnx_export_test.py:271: in test_export2python_produces_correct_onnx_script_model
    functions = extract_functions(backend_test.name, code, self.test_folder)
onnxscript\backend\onnx_export_test.py:139: in extract_functions
    raise AssertionError(
E   AssertionError: Unable to import 'tests.onnx_backend_test_code.test_and_bcast4v3d' (e=No module named 'tests.onnx_backend_test_code.test_and_bcast4v3d') (file: 'D:\\a\\onnxscript\\onnxscript\\tests\\onnx_backend_test_code\\test_and_bcast4v3d.py', absolute path: 'D:\\a\\onnxscript\\onnxscript\\tests\\onnx_backend_test_code\\test_and_bcast4v3d.py', current folder: D:\a\onnxscript\onnxscript
E   ---- CONTENT --
E   import numpy
E   from onnx import TensorProto
E   from onnx.helper import make_tensor
E   from onnxscript import script, external_tensor
E   from onnxscript.values import Opset
E   from onnxscript.onnx_types import BOOL
E   from onnxscript.onnx_opset import opset7
E   
E   @script()
E   def bck_test_and_bcast4v3d(x: BOOL[3,4,5,6], y: BOOL[4,5,6]) -> (BOOL[3,4,5,6]):
E       r_and = opset7.And(x, y)
E       return r_and

To view more test analytics, go to the Test Analytics Dashboard
📢 Thoughts on this report? Let us know!

onnxscript/ir/_io.py Outdated Show resolved Hide resolved
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@justinchuby justinchuby added hold on merging Don't merge yet topic: IR Intermediate representation labels Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (2)

onnxscript/ir/_external_data.py:174

  • The variable name 'base_path' should be renamed to 'base_dir' for consistency.
base_path: str | os.PathLike,

onnxscript/ir/_external_data.py:252

  • The word 'unneccesarry' should be corrected to 'unnecessary'.
# Sort all tensors based on tensor sizes, in order to avoid unneccesarry alignment.
onnxscript/ir/_io_test.py Fixed Show fixed Hide fixed
onnxscript/ir/_io_test.py Fixed Show fixed Hide fixed
onnxscript/ir/_io_test.py Fixed Show fixed Hide fixed
onnxscript/ir/_io_test.py Fixed Show fixed Hide fixed
onnxscript/ir/_io_test.py Fixed Show fixed Hide fixed
onnxscript/ir/_io_test.py Fixed Show fixed Hide fixed
onnxscript/ir/_io_test.py Fixed Show fixed Hide fixed
onnxscript/ir/_io_test.py Fixed Show fixed Hide fixed
@@ -0,0 +1,128 @@
import os

Check warning

Code scanning / lintrunner

RUFF/CPY001 Warning

Missing copyright notice at top of file.
See https://docs.astral.sh/ruff/rules/missing-copyright-notice
@@ -0,0 +1,128 @@
import os

Check warning

Code scanning / lintrunner

RUFF-FORMAT/format Warning

Run lintrunner -a to apply this patch.

def _create_simple_model():
tensor = ir.tensor([1.0], dtype=ir.DataType.FLOAT, name="X")
node = ir.Node("Identity", inputs=[tensor], outputs=["Y"])

Check failure

Code scanning / lintrunner

PYLINT/E1120 Error

No value for argument 'op_type' in constructor call (no-value-for-parameter)
See no-value-for-parameter. To disable, use # pylint: disable=no-value-for-parameter
node = ir.node("Identity", inputs=[tensor], outputs=["Y"])
graph = ir.graph([node], name="test_graph", outputs=[node.outputs[0]], initializers=[tensor])
model = ir.model(graph)
core_model = _core.Model(model)

Check failure

Code scanning / lintrunner

PYLINT/E0602 Error

Undefined variable '_core' (undefined-variable)
See undefined-variable. To disable, use # pylint: disable=undefined-variable
node = ir.node("Identity", inputs=[tensor], outputs=["Y"])
graph = ir.graph([node], name="test_graph", outputs=[node.outputs[0]], initializers=[tensor])
model = ir.model(graph)
core_model = _core.Model(model)

Check failure

Code scanning / lintrunner

RUFF/F821 Error

node = ir.node("Identity", inputs=[tensor], outputs=["Y"])
graph = ir.graph([node], name="test_graph", outputs=[node.outputs[0]], initializers=[tensor])
model = ir.model(graph)
core_model = _core.Model(model)

Check failure

Code scanning / lintrunner

RUFF/F821 Error

node = ir.node("Identity", inputs=[tensor], outputs=["Y"])
graph = ir.graph([node], name="test_graph", outputs=[node.outputs[0]], initializers=[tensor])
model = ir.model(graph)
core_model = _core.Model(model)

Check failure

Code scanning / lintrunner

PYLINT/E0602 Error

Undefined variable '_core' (undefined-variable)
See undefined-variable. To disable, use # pylint: disable=undefined-variable
node = ir.node("Identity", inputs=[tensor], outputs=["Y"])
graph = ir.graph([node], name="test_graph", outputs=[node.outputs[0]], initializers=[tensor])
model = ir.model(graph)
core_model = _core.Model(model)

Check failure

Code scanning / lintrunner

RUFF/F821 Error

node = ir.node("Identity", inputs=[tensor], outputs=["Y"])
graph = ir.graph([node], name="test_graph", outputs=[node.outputs[0]], initializers=[tensor])
model = ir.model(graph)
core_model = _core.Model(model)

Check failure

Code scanning / lintrunner

PYLINT/E0602 Error

Undefined variable '_core' (undefined-variable)
See undefined-variable. To disable, use # pylint: disable=undefined-variable
node = ir.node("Identity", inputs=[tensor], outputs=["Y"])
graph = ir.graph([node], name="test_graph", outputs=[node.outputs[0]], initializers=[tensor])
model = ir.model(graph)
core_model = _core.Model(model)

Check failure

Code scanning / lintrunner

RUFF/F821 Error

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hold on merging Don't merge yet topic: IR Intermediate representation
Projects
Development

Successfully merging this pull request may close these issues.

1 participant