-
-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add backend tests from tarball using Tar.jl #89
Conversation
Just my random opinion but it may be a good use case for an Artifact ? Putting the tar in the repo means that every user of ONNX.jl will need to download it even if not running tests. |
Not random at all. That's much better, and I try to implement that. |
One problem that might arise is where to host the Tar archive. A solution which has been done before is to create a new repo where we create GitHub releases and upload the tar archives as artifacts there (see NodeJSBuilder for example). However, I don't know what is the most convenient for the FluxML org. |
If this tarball is being derived from the ONNX repo, would it make sense to use it directly via a submodule? Not sure how good Pkg support for those is. |
So currently we continue without the artifact? I think FluxML should have artifact repo for models and datasets etc. |
There's nothing fundamentally wrong with using artifacts, but they do require versioning and ongoing maintenance. Artifacts are also generally preferred for handling binary data or data that would be inappropriate to version control with git. My perhaps incorrect understanding of the ONNX tests is that they're not explicitly versioned, and everything is a VCS-compatible text file already. Thus it may make more sense to use a submodule for them instead of going through the longer process of creating and registering artifacts. |
Got it. |
Do I need to make a new repo? |
Sorry for the silence! I don't know much about git submodules either, but the following seems to work: dfdx@dfdx:~/work/ONNX.jl$ git submodule add git@github.com:onnx/onnx.git test/onnx
Cloning into '/home/azbs/work/ONNX.jl/test/onnx'...
remote: Enumerating objects: 42908, done.
remote: Counting objects: 100% (834/834), done.
remote: Compressing objects: 100% (417/417), done.
remote: Total 42908 (delta 462), reused 692 (delta 403), pack-reused 42074
Receiving objects: 100% (42908/42908), 29.19 MiB | 309.00 KiB/s, done.
Resolving deltas: 100% (25262/25262), done.
dfdx@dfdx:~/work/ONNX.jl$ git status
On branch test-suites-as-submodule
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
new file: .gitmodules
new file: test/onnx |
Yes, sorry. I saw your questions too, but I haven't yet looked for an answer on how to handle submodules in the usual build/CI workflow. Looking through JuliaLang/Pkg.jl#708, perhaps a subtree would be a better option than a submodule? |
That's looks bad. I think we are overengineering the solution. It's 22MB of binary files (could be compressed, but that will add a pkg dependency). For any git base solution (submodule/subtree/ data repo), I need help opening that one under FluxML (just empty repo with meaningful name). |
My main question with self hosting would be how to keep these files up to date and who is responsible for doing so. Also you said these were binary files? Are those created from the text files in the ONNX repo, or do they come from somewhere else? |
Oh I see, they're already binary files. What I meant by updating is, what happens when the ONNX maintainers decide to change the contents of e.g. https://github.com/onnx/onnx/tree/main/onnx/backend/test/data/node/test_adagrad_multiple on their end? I presume we'd want have those updates on our end too, but because Julia artifacts are generally considered a snapshot of some data in time I think it'd require manual effort to do so. |
I’m proposing do it manually. |
BTW, what about onnx.proto3. |
I'm afraid I don't know what onnx.proto3 is. Does the official repro provide both protobuf v2 and v3 of the artifacts we're interested in, but we're only handling and/or able to handle the v2 versions right now? |
onnx.proto3 is the protobuf interface definition. I think we should focus on adding more operations, and the test suite is just scaffolding. |
test/backend.jl
Outdated
end | ||
|
||
@testset "Nodes" begin | ||
prefix = extract("backend_data.tar")*"/data/node/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these files deleted after the test? If not, we should probably add the while directory to the gitignore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I though that if it is going to /tmp/
I don’t have to worry about it, but you are right.
I will fix that.
I don't like the overengineering discussion we are falling into, but adding a 22Mb file to the repo makes me nervous too. The problem is that once this files gets into the git history, there's no way to delete it from there. Even worse, when we ever change the file (and I assume sometimes we will do it), the size is multiplied. This way we may end up with > 100Mb repo from which we only 136Kb (current size of How about downloading the tar file directly from the Github release? I believe we can stick to the current release, download it in the beginning of the test suit and add the file to gitignore. I can foresee some edge cases in this approach too, but it looks like a little and totally reversible change that will unblock the real value - adding and testing more operations. |
100% agreed with you. |
I mean using the release of the onnx repo. That is, instead of putting the tar file to ONNX.jl, use something like this: const ONNX_RELEASE_URL = "https://github.com/onnx/onnx/archive/refs/tags/v1.14.0.tar.gz"
onnx_release_tar_path = dirname(@__FILE__) * "/tests/backend_data.tar.gz"
onnx_release_path = dirname(@__FILE__) * "/tests/data/"
if !isfile(onnx_release_path)
download(ONNX_RELEASE_URL, onnx_release_tar_path)
extract(onnx_release_tar_path, onnx_release_path)
end
... |
Well, we can do it, but it isn't like the files are in the repo, and we just need to access those. |
Oh, I didn't realize it. Yet, I believe we can gnerate the files during the first run of the tests and keep the result on that directory. |
Can I assume that everyone has |
Ouch. Now I see your your point. I agree then that putting a single tar into a repo is just fine, let's do it and come back to the question when we want to update it. @ToucheSir if you are ok with this approach, I will merge this PR. |
I can see why there was an argument made for using artifacts now, thanks for your patience in walking through this @ordicker. I'm fine with the current approach as long as we put a pin in finding something better when it comes time to update the tarball. |
Sounds good. Let's add the unpacked files to |
Sure thing @ToucheSir, but in the process you convinced me that adding tarball to a repo is a bad practice. If you agree, I'll do:
|
Bump @ToucheSir @dfdx |
Oh, sorry, I didn't realize you are awaiting for the confirmation 🤦 Yes, the plan sounds great to me. |
I have done it... |
So, are we done? @ToucheSir @dfdx |
Damn! I missed it again! Sorry for that - I've been working in an extreme multitasking mode for the past few months, so if I don't reply quickly, don't hesitate to ping me again. The code looks good to me. I've also checked the artifact itself and verified that tests don't expose any obvious remote code execution capabilities. Thank you for all the patience with us! |
No need to apologize, day job is the worst 🤪 Now my plan is to add one operator at the time and hopefully the community will join |
Hi,
ONNX have huge test suite
I wrote one test for review.
I think we should aim to pass all the tests.
Same PR but with tarball (uncompressed)
PR Checklist