Skip to content

Upgrade to onnx 1.9 #847

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 40 commits into from
Oct 12, 2021
Merged

Upgrade to onnx 1.9 #847

merged 40 commits into from
Oct 12, 2021

Conversation

alonre24
Copy link
Collaborator

@alonre24 alonre24 commented Oct 5, 2021

This is an infrastructure PR for upgrading onnxruntime to version 1.9.0
This will include @chayim work in #785 (which will be closed as from now).

We temporarily remove the use of Redis allocator as a custom allocator for onnx backend, as this feature relied on our fork onnxruntime repo. The support in Redis custom allocator will be re-introduced in #827, that is dependent on having onnx 1.9.0.

We build onnx 1.9 with the new DISABLE_EXTERNAL_INITIALIZERS flag, and test that we actually cannot load a model that uses external initializers.

@codecov
Copy link

codecov bot commented Oct 5, 2021

Codecov Report

Merging #847 (ab20006) into master (de0f302) will increase coverage by 1.03%.
The diff coverage is 91.46%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #847      +/-   ##
==========================================
+ Coverage   79.97%   81.00%   +1.03%     
==========================================
  Files          53       55       +2     
  Lines        8009     8144     +135     
==========================================
+ Hits         6405     6597     +192     
+ Misses       1604     1547      -57     
Impacted Files Coverage Δ
src/serialization/AOF/rai_aof_rewrite.c 0.00% <0.00%> (ø)
.../serialization/RDB/decoder/previous/v2/decode_v2.c 46.56% <0.00%> (+9.81%) ⬆️
src/execution/parsing/deprecated.c 81.18% <66.66%> (+1.48%) ⬆️
...c/serialization/RDB/decoder/current/v4/decode_v4.c 71.42% <71.42%> (ø)
.../serialization/RDB/decoder/previous/v0/decode_v0.c 63.33% <72.72%> (+1.69%) ⬆️
tests/module/LLAPI.c 75.91% <79.06%> (+1.44%) ⬆️
src/serialization/RDB/encoder/v4/encode_v4.c 85.96% <81.81%> (ø)
src/backends/tensorflow.c 72.16% <91.00%> (+3.12%) ⬆️
src/backends/onnxruntime.c 73.50% <91.95%> (-3.88%) ⬇️
src/redis_ai_objects/tensor.c 91.93% <94.71%> (+5.24%) ⬆️
... and 22 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3e8d7a1...ab20006. Read the comment docs.

@alonre24 alonre24 requested a review from DvirDukhan October 7, 2021 14:01
@alonre24 alonre24 marked this pull request as ready for review October 7, 2021 14:01

def test_forbidden_external_initializers(env):
if not TEST_ONNX:
env.debugPrint("skipping {} since TEST_ONNX=0".format(sys._getframe().f_code.co_name), force=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
env.debugPrint("skipping {} since TEST_ONNX=0".format(sys._getframe().f_code.co_name), force=True)
env.debugPrint(f"skipping {sys._getframe().f_code.co_name} since TEST_ONNX=0", force=True)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do in another PR


# move the external initializer to the redis' current dir (tests/flow/logs)
external_initializer_model = load_file_content("model_with_external_initializers.onnx")
shutil.copy(ROOT+"/tests/flow/test_data/Pads.bin", ROOT+"/tests/flow/logs")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ONNX is set to look for the external initializers by default in the current working directory. When we run our tests, this location is ROOT+"/tests/flow/logs", so I copy the initializer from our test_data directory into there.

'AI.MODELSTORE', 'ext_initializers_model{1}', 'ONNX', DEVICE,
'BLOB', external_initializer_model)

os.remove(ROOT+"/tests/flow/logs/Pads.bin")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this here?

Copy link
Collaborator Author

@alonre24 alonre24 Oct 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ONNX is set to look for the external initializers by default in the current working directory. When we run our tests, this location is ROOT+"/tests/flow/logs", so I after we copy the initializer into there, we remove it at the end of the test

@alonre24 alonre24 removed the ci-test label Oct 11, 2021
@alonre24 alonre24 added ci-test and removed ci-test labels Oct 11, 2021
@@ -0,0 +1,47 @@
# RedisAI Development Backends

This document describes how a backend for RedisAI can be built, from this repository. It highlights the supported compilation devices on a per-backend basis, and highlights the tools and commands required. Unless indicated otherwise, a backend is compiled in a docker, which is responsible for the configuration and installation of all tools required for a given backend on a per-platform basis.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why "a backend"?
we build only ONNXRuntime
maybe explain the need for building a backend. First, start with explaining that we use DISABLE_EXTERNAL_INITIALIZERS=ON and that made go and building the backend from the source. Also, explain that we do not build the other backend libraries but downloading their binaries from the library websites.

When we need to add more "backend builds" we can add additional "how-to"s

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I understand. I that OK if I will make the rephrasing in the next PR (#806 - the one that updates Torch and TF)?

ignore_unversioned_libs
Memcheck:Overlap
...
obj:*/libonnxruntime.so*
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the leak/error that made you introduce this?

Copy link
Collaborator Author

@alonre24 alonre24 Oct 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

==13852==    at 0x483DA20: __memcpy_chk (vg_replace_strmem.c:1593)
==13852==    by 0x7EA062B: cpuinfo_linux_parse_cpulist (in /root/project/bin/linux-x64-release/install-cpu/backends/redisai_onnxruntime/lib/libonnxruntime.so.1.9.0)
==13852==    by 0x7E9C8D6: cpuinfo_linux_get_max_possible_processor (in /root/project/bin/linux-x64-release/install-cpu/backends/redisai_onnxruntime/lib/libonnxruntime.so.1.9.0)
==13852==    by 0x7E9AF61: cpuinfo_x86_linux_init (in /root/project/bin/linux-x64-release/install-cpu/backends/redisai_onnxruntime/lib/libonnxruntime.so.1.9.0)
==13852==    by 0x4D6C996: __pthread_once_slow (pthread_once.c:116)
==13852==    by 0x7E9AE26: cpuinfo_initialize (in /root/project/bin/linux-x64-release/install-cpu/backends/redisai_onnxruntime/lib/libonnxruntime.so.1.9.0)
==13852==    by 0x7CFA76A: onnxruntime::CPUIDInfo::CPUIDInfo() (in /root/project/bin/linux-x64-release/install-cpu/backends/redisai_onnxruntime/lib/libonnxruntime.so.1.9.0)
==13852==    by 0x400F379: call_init.part.0 (dl-init.c:72)
==13852==    by 0x400F475: call_init (dl-init.c:118)
==13852==    by 0x400F475: _dl_init (dl-init.c:119)
==13852==    by 0x40132D2: dl_open_worker (dl-open.c:517)
==13852==    by 0x4EB2B2E: _dl_catch_exception (dl-error-skeleton.c:196)
==13852==    by 0x4012BB9: _dl_open (dl-open.c:599)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is coming from ONNX, we have it on every test that uses ONNX. It basically says (as I understand) that they use memcpy instead of memmove to copy memory from one place to another in the same memory area

@alonre24 alonre24 merged commit b04ec83 into master Oct 12, 2021
@alonre24 alonre24 deleted the upgrade_to_onnx_1.9 branch October 12, 2021 07:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants