Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update isnan implementation in WebGL backend to follow IEEE 754-1985 #6107

Merged
merged 3 commits into from
Feb 17, 2022

Conversation

shaoboyan
Copy link
Contributor

@shaoboyan shaoboyan commented Feb 9, 2022

Previous implementation of isNaN in WebGL relies on the platform behaviour.
And it might generate incorrect results(See #5800).
This PR implements isnan based on the rules in IEEE 754-1985 to restrict
the rules.

Issue:#5800

To see the logs from the Cloud Build CI, please join either our discussion or announcement mailing list.


This change is Reviewable

@shaoboyan
Copy link
Contributor Author

shaoboyan commented Feb 9, 2022

@lina128 @pyu10055
Hi, Na and Ping,
I see there are cases failure in tfjs-layer test case and on safari. It seems that there is a log file to record the detail infos but I cannot access it.
The logs from the bots said :

Step #2 - "run-cloudbuild": Step #5 - "bazel-tests": //tfjs-backend-webgl:browserstack_bs_safari_mac_tfjs-backend-webgl1_test FAILED in 3 out of 3 in 106.7s
Step #2 - "run-cloudbuild": Step #5 - "bazel-tests":   Stats over 3 runs: max = 106.7s, min = 89.8s, avg = 99.7s, dev = 7.2s
Step #2 - "run-cloudbuild": Step #5 - "bazel-tests":   /builder/home/.cache/bazel/_bazel_root/eab0d61a99b6696edb3d2aff87b585e8/execroot/tfjs/bazel-out/k8-fastbuild/testlogs/tfjs-backend-webgl/browserstack_bs_safari_mac_tfjs-backend-webgl1_test/test.log
Step #2 - "run-cloudbuild": Step #5 - "bazel-tests":   /builder/home/.cache/bazel/_bazel_root/eab0d61a99b6696edb3d2aff87b585e8/execroot/tfjs/bazel-out/k8-fastbuild/testlogs/tfjs-backend-webgl/browserstack_bs_safari_mac_tfjs-backend-webgl1_test/test_attempts/attempt_1.log
Step #2 - "run-cloudbuild": Step #5 - "bazel-tests":   /builder/home/.cache/bazel/_bazel_root/eab0d61a99b6696edb3d2aff87b585e8/execroot/tfjs/bazel-out/k8-fastbuild/testlogs/tfjs-backend-webgl/browserstack_bs_safari_mac_tfjs-backend-webgl1_test/test_attempts/attempt_2.log
Step #2 - "run-cloudbuild": Step #5 - "bazel-tests": //tfjs-layers:browserstack_bs_chrome_mac_tfjs-layers_webgl2_test         FAILED in 3 out of 3 in 137.2s
Step #2 - "run-cloudbuild": Step #5 - "bazel-tests":   Stats over 3 runs: max = 137.2s, min = 126.2s, avg = 130.3s, dev = 4.9s
Step #2 - "run-cloudbuild": Step #5 - "bazel-tests":   /builder/home/.cache/bazel/_bazel_root/eab0d61a99b6696edb3d2aff87b585e8/execroot/tfjs/bazel-out/k8-fastbuild/testlogs/tfjs-layers/browserstack_bs_chrome_mac_tfjs-layers_webgl2_test/test.log
Step #2 - "run-cloudbuild": Step #5 - "bazel-tests":   /builder/home/.cache/bazel/_bazel_root/eab0d61a99b6696edb3d2aff87b585e8/execroot/tfjs/bazel-out/k8-fastbuild/testlogs/tfjs-layers/browserstack_bs_chrome_mac_tfjs-layers_webgl2_test/test_attempts/attempt_1.log
Step #2 - "run-cloudbuild": Step #5 - "bazel-tests":   /builder/home/.cache/bazel/_bazel_root/eab0d61a99b6696edb3d2aff87b585e8/execroot/tfjs/bazel-out/k8-fastbuild/testlogs/tfjs-layers/browserstack_bs_chrome_mac_tfjs-layers_webgl2_test/test_attempts/attempt_2.log
Step #2 - "run-cloudbuild": Step #5 - "bazel-tests": //tfjs-layers:browserstack_bs_safari_mac_tfjs-layers_webgl1_test         FAILED in 3 out of 3 in 61.1s
Step #2 - "run-cloudbuild": Step #5 - "bazel-tests":   Stats over 3 runs: max = 61.1s, min = 49.3s, avg = 53.5s, dev = 5.4s
Step #2 - "run-cloudbuild": Step #5 - "bazel-tests":   /builder/home/.cache/bazel/_bazel_root/eab0d61a99b6696edb3d2aff87b585e8/execroot/tfjs/bazel-out/k8-fastbuild/testlogs/tfjs-layers/browserstack_bs_safari_mac_tfjs-layers_webgl1_test/test.log
Step #2 - "run-cloudbuild": Step #5 - "bazel-tests":   /builder/home/.cache/bazel/_bazel_root/eab0d61a99b6696edb3d2aff87b585e8/execroot/tfjs/bazel-out/k8-fastbuild/testlogs/tfjs-layers/browserstack_bs_safari_mac_tfjs-layers_webgl1_test/test_attempts/attempt_1.log
Step #2 - "run-cloudbuild": Step #5 - "bazel-tests":   /builder/home/.cache/bazel/_bazel_root/eab0d61a99b6696edb3d2aff87b585e8/execroot/tfjs/bazel-out/k8-fastbuild/testlogs/tfjs-layers/browserstack_bs_safari_mac_tfjs-layers_webgl1_test/test_attempts/attempt_2.log

Do you have any suggestions to know the failure log infos to let me know which case fail and the fail reason? Thanks!

@mattsoulanille
Copy link
Member

@shaoboyan, you should be able to get to the logs by following this link which can be found in the bazel-tests step.
image

From that page, you should be able to see the logs for each run:

image

This is definitely not obvious, and we're working on surfacing these logs better. Sorry for the confusion.

Copy link
Collaborator

@pyu10055 pyu10055 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, can you double check logic for float16 texture?

Reviewable status: 0 of 1 approvals obtained (waiting on @ahmedsabie and @mattsoulanille)

@shaoboyan
Copy link
Contributor Author

@mattsoulanille and @pyu10055 The log is super helpful and I'll fix them.

Copy link
Contributor Author

@shaoboyan shaoboyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pyu10055 The texture format doesn't affect what we got in the shader. For example, sampling from rgba8unorm texture will got float 32 values in the shader. The same to the 16-bit format texture.
More explain about why limit the new isnan to WebGL2:
In WebGL 1, the glsl version doesn't support bot 'uint' type and any bitcast function like 'floatBitsToUint'. So we cannot implement the bit compare version of isnan for WebGL 1.0.
In WebGL 2.0, we have all these type and built-in functions.

cc @mattsoulanille @lina128 too

PTAL, thanks!

Reviewable status: 0 of 1 approvals obtained (waiting on @ahmedsabie and @mattsoulanille)

Copy link
Contributor Author

@shaoboyan shaoboyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattsoulanille @pyu10055 @lina128
I've seen the nan-test bots failure and go through the logs.
The error said:

Step #2 - "run-cloudbuild": BUILD FAILURE: Build step failure: build step 27 "gcr.io/learnjs-174218/release" failed: step exited with non-zero status: 1

And I think it is due to the error below:

Step #2 - "run-cloudbuild": Step #48 - "test-e2e": $ karma start --browserstack --browsers=bs_android_9 --tags '#SMOKE,#REGRESSION'
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": �[32m10 02 2022 18:40:00.098:INFO [compiler.karma-typescript]: �[39mCompiling project using Typescript 3.5.3
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": �[91m10 02 2022 18:40:04.949:ERROR [compiler.karma-typescript]: �[39mnode_modules/@types/node/assert.d.ts(12,64): error TS2304: Cannot find name 'asserts'.
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": 
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": �[91m10 02 2022 18:40:04.950:ERROR [compiler.karma-typescript]: �[39mnode_modules/@types/node/assert.d.ts(12,72): error TS1144: '{' or ';' expected.
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": 
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": �[91m10 02 2022 18:40:04.950:ERROR [compiler.karma-typescript]: �[39mnode_modules/@types/node/assert.d.ts(12,72): error TS2304: Cannot find name 'value'.
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": 
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": �[91m10 02 2022 18:40:04.950:ERROR [compiler.karma-typescript]: �[39mnode_modules/@types/node/assert.d.ts(229,64): error TS2304: Cannot find name 'asserts'.
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": 
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": �[91m10 02 2022 18:40:04.950:ERROR [compiler.karma-typescript]: �[39mnode_modules/@types/node/assert.d.ts(229,72): error TS1144: '{' or ';' expected.
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": 
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": �[91m10 02 2022 18:40:04.951:ERROR [compiler.karma-typescript]: �[39mnode_modules/@types/node/assert.d.ts(229,72): error TS2304: Cannot find name 'value'.
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": 
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": �[91m10 02 2022 18:40:04.951:ERROR [compiler.karma-typescript]: �[39mnode_modules/@types/node/assert.d.ts(400,90): error TS2304: Cannot find name 'asserts'.
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": 
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": �[91m10 02 2022 18:40:04.951:ERROR [compiler.karma-typescript]: �[39mnode_modules/@types/node/assert.d.ts(400,98): error TS1144: '{' or ';' expected.
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": 
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": �[91m10 02 2022 18:40:04.952:ERROR [compiler.karma-typescript]: �[39mnode_modules/@types/node/assert.d.ts(400,98): error TS2304: Cannot find name 'actual'.
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": 
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": �[91m10 02 2022 18:40:04.952:ERROR [compiler.karma-typescript]: �[39mnode_modules/@types/node/assert.d.ts(400,105): error TS1005: ';' expected.
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": 
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": �[91m10 02 2022 18:40:04.952:ERROR [compiler.karma-typescript]: �[39mnode_modules/@types/node/assert.d.ts(400,105): error TS2304: Cannot find name 'is'.
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": 
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": �[91m10 02 2022 18:40:04.953:ERROR [compiler.karma-typescript]: �[39mnode_modules/@types/node/assert.d.ts(400,108): error TS1005: ';' expected.
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": 
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": �[91m10 02 2022 18:40:04.953:ERROR [compiler.karma-typescript]: �[39mnode_modules/@types/node/assert.d.ts(400,108): error TS2304: Cannot find name 'T'.
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": 
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": �[91m10 02 2022 18:40:04.953:ERROR [compiler.karma-typescript]: �[39mnode_modules/@types/node/assert.d.ts(432,94): error TS2304: Cannot find name 'asserts'.
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": 
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": �[91m10 02 2022 18:40:04.954:ERROR [compiler.karma-typescript]: �[39mnode_modules/@types/node/assert.d.ts(432,102): error TS1144: '{' or ';' expected.
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": 
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": �[91m10 02 2022 18:40:04.954:ERROR [compiler.karma-typescript]: �[39mnode_modules/@types/node/assert.d.ts(432,102): error TS2304: Cannot find name 'actual'.
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": 
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": �[91m10 02 2022 18:40:04.954:ERROR [compiler.karma-typescript]: �[39mnode_modules/@types/node/assert.d.ts(432,109): error TS1005: ';' expected.
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": 
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": �[91m10 02 2022 18:40:04.954:ERROR [compiler.karma-typescript]: �[39mnode_modules/@types/node/assert.d.ts(432,109): error TS2304: Cannot find name 'is'.
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": 
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": �[91m10 02 2022 18:40:04.954:ERROR [compiler.karma-typescript]: �[39mnode_modules/@types/node/assert.d.ts(432,112): error TS1005: ';' expected.
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": 
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": �[91m10 02 2022 18:40:04.955:ERROR [compiler.karma-typescript]: �[39mnode_modules/@types/node/assert.d.ts(432,112): error TS2304: Cannot find name 'T'.
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": 
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": �[91m10 02 2022 18:40:04.955:ERROR [compiler.karma-typescript]: �[39mnode_modules/@types/node/assert.d.ts(734,43): error TS2304: Cannot find name 'asserts'.
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": 
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": �[91m10 02 2022 18:40:04.955:ERROR [compiler.karma-typescript]: �[39mnode_modules/@types/node/assert.d.ts(734,51): error TS1144: '{' or ';' expected.
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": 
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": �[91m10 02 2022 18:40:04.955:ERROR [compiler.karma-typescript]: �[39mnode_modules/@types/node/assert.d.ts(734,51): error TS2304: Cannot find name 'value'.
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": 
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": �[91m10 02 2022 18:40:04.955:ERROR [compiler.karma-typescript]: �[39mnode_modules/@types/node/assert.d.ts(734,57): error TS1005: ';' expected.
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": 
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": �[91m10 02 2022 18:40:04.955:ERROR [compiler.karma-typescript]: �[39mnode_modules/@types/node/assert.d.ts(734,57): error TS2304: Cannot find name 'is'.
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": 
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": �[91m10 02 2022 18:40:04.956:ERROR [compiler.karma-typescript]: �[39mnode_modules/@types/node/assert.d.ts(734,60): error TS1005: ';' expected.
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": 
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": �[91m10 02 2022 18:40:04.956:ERROR [compiler.karma-typescript]: �[39mnode_modules/@types/node/assert.d.ts(734,60): error TS2531: Object is possibly 'null'.
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": 
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": �[91m10 02 2022 18:40:04.956:ERROR [compiler.karma-typescript]: �[39mnode_modules/@types/node/assert.d.ts(734,67): error TS2532: Object is possibly 'undefined'.
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": 
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": �[91m10 02 2022 18:40:04.956:ERROR [compiler.karma-typescript]: �[39mnode_modules/@types/node/assert.d.ts(892,57): error TS2304: Cannot find name 'asserts'.
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": 
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": �[91m10 02 2022 18:40:04.956:ERROR [compiler.karma-typescript]: �[39mnode_modules/@types/node/assert.d.ts(892,65): error TS1005: ';' expected.
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": 
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": �[91m10 02 2022 18:40:04.956:ERROR [compiler.karma-typescript]: �[39mnode_modules/@types/node/assert.d.ts(892,65): error TS7008: Member 'value' implicitly has an 'any' type.
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": 
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": �[91m10 02 2022 18:40:04.957:ERROR [compiler.karma-typescript]: �[39mnode_modules/@types/node/stream.d.ts(842,106): error TS2304: Cannot find name 'AsyncGeneratorFunction'.
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": 
Step #2 - "run-cloudbuild": Step #48 - "test-e2e": �[91m10 02 2022 18:40:04.957:ERROR [compiler.karma-typescript]: �[39mnode_modules/@types/node/url.d.ts(878,13): error TS2403: Subsequent variable declarations must have the same type.  Variable 'URLSearchParams' must be of type '{ new (init?: string | Record<string, string> | URLSearchParams | string[][]): URLSearchParams; prototype: URLSearchParams; }', but here has type '{ new (init?: string | Record<string, string> | URLSearchParams | string[][]): URLSearchParams; prototype: URLSearchParams; toString(): string; }'.

And I don't think it is related to isnan implementation in WebGL 2.
I've rebase the branch(to see whether it will fix the nan-test bots) but it seems that non-test bots is not trigger automatically. Would you mind to help trigger it again?

Another interesting is that the bots report regression:

Step #2 - "run-cloudbuild": Step #48 - "test-e2e":   #REGRESSION convert_predict webgl {"WEBGL_VERSION":2,"WEBGL_CPU_FORWARD":false,"WEBGL_SIZE_UPLOAD_UNIFORM":0}
Step #2 - "run-cloudbuild": Step #48 - "test-e2e":     �[32m✓ �[39msaved_model_v1.

But I also find it reports regression for WebGL1 too. So does this report mean anything?

Reviewable status: 0 of 1 approvals obtained (waiting on @ahmedsabie and @mattsoulanille)

@shaoboyan shaoboyan requested a review from pyu10055 February 11, 2022 02:51
Copy link
Collaborator

@lina128 lina128 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Shaobo, the nightly test is retriggered with your latest change, please take a look.

Reviewable status: 0 of 1 approvals obtained (waiting on @ahmedsabie, @mattsoulanille, and @pyu10055)

Copy link
Contributor Author

@shaoboyan shaoboyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, Na, thanks for helping. And I still find the compiling error above. Reviewers, do you have any ideas? Thanks!

Reviewable status: 0 of 1 approvals obtained (waiting on @ahmedsabie, @mattsoulanille, and @pyu10055)

Copy link
Collaborator

@lina128 lina128 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Shaobo, yes we are working on it. We let you know once it's fixed, should be tomorrow.

Reviewable status: 0 of 1 approvals obtained (waiting on @ahmedsabie, @mattsoulanille, and @pyu10055)

Copy link
Contributor

@ahmedsabie ahmedsabie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r1.
Reviewable status: :shipit: complete! 1 of 1 approvals obtained (waiting on @mattsoulanille, @pyu10055, and @shaoboyan)

a discussion (no related file):
It passed the nightly tests which include the multiplatform nan tests, LGTM!


Copy link
Collaborator

@lina128 lina128 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Reviewable status: :shipit: complete! 2 of 1 approvals obtained (waiting on @mattsoulanille, @pyu10055, and @shaoboyan)

Copy link
Contributor Author

@shaoboyan shaoboyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks all for the help!

Reviewable status: :shipit: complete! 2 of 1 approvals obtained (waiting on @ahmedsabie, @mattsoulanille, and @pyu10055)

a discussion (no related file):

Previously, ahmedsabie wrote…

It passed the nightly tests which include the multiplatform nan tests, LGTM!

Thanks for the help!


Copy link
Contributor Author

@shaoboyan shaoboyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lina128 Sry but it seems the bots still fails and this time it seems the log doesn't report any error message. But it is the same failure as previous one. So would you mind to double check whether the bots is still need to be fixed?

Reviewable status: :shipit: complete! 2 of 1 approvals obtained (waiting on @ahmedsabie, @mattsoulanille, and @pyu10055)

Previous implementation of isNaN in WebGL relies on the platform behaviour.
And it might generate incorrect results(See tensorflow#5800).
This PR implements isnan based on the rules in IEEE 754-1985 to restrict
the rules.

Issue:tensorflow#5800
@lina128 lina128 merged commit a51929e into tensorflow:master Feb 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants