Skip to content
This repository was archived by the owner on Jul 1, 2023. It is now read-only.

Conversation

@t-ae
Copy link
Contributor

@t-ae t-ae commented Feb 20, 2020

SequentialTests.testSequential fails after #678 with slightly different values.

#678 changes some operator orders. For example, model.move(along: -learningRate * direction ./ denominator) is changed to model.move(along: (direction ./ denominator).scaled(by: -learningRate)).
There are several changes like this. They seem the cause of the failure.

It looks the expected values in this test is not important, so I replace it with shape check.

@saeta
Copy link
Contributor

saeta commented Feb 20, 2020

Thanks for the PR! I was initially very confused, as the kokoro checks passed for #678. I took a quick look at the CI log for that PR, and found that SequentialTests weren't run. A bit of sleuthing showed that they were disabled in 3304db3 and haven't been re-enabled.

When I reenable the sequential tests, I see the following error on a recent toolchain:

/usr/local/google/home/saeta/src/s4tf2/tensorflow-swift-apis/Tests/TensorFlowTests/SequentialTests.swift:65: error: SequentialTests.testSequential : XCTAssertEqual failed: ("[[0.0],
 [0.0],
 [0.0],
 [0.0]]") is not equal to ("[[0.50378805],
 [0.50378805],
 [0.50378805],
 [0.50378805]]") - 

Since I am suspicious of the all-zero result (although the test appears to be a little wacky), it seems like this might be catching a worthwhile error, and so I'd prefer that we don't just convert it to a shape check. Does that make sense?

CC @asuhan / @sgugger / @dan-zheng

@t-ae
Copy link
Contributor Author

t-ae commented Feb 20, 2020

@saeta
In my environment (macOS 10.14.6/XCode11.3.1/tensorflow-RELEASE-0.7), the values are not all zero but:

/Users/araki/Desktop/t-ae/S4TF/swift-apis/Tests/TensorFlowTests/SequentialTests.swift:65: error: -[TensorFlowTests.SequentialTests testSequential] : XCTAssertEqual failed: ("[[0.5048693],
 [0.5048693],
 [0.5048693],
 [0.5048693]]") is not equal to ("[[0.50378805],
 [0.50378805],
 [0.50378805],
 [0.50378805]]")

There can be linux specific bug.

@asuhan
Copy link
Contributor

asuhan commented Feb 20, 2020

@saeta The zero values are a bit suspicious indeed, we have an internal bug tracking this as well. However, note that there's a relu on the output, which can go to zero if the values returned by the dense layer are negative, which isn't that unlikely given the small number of elements involved.

@saeta
Copy link
Contributor

saeta commented Feb 20, 2020

Right, but given that it's trying to predict values in the set of [0, 1], I was a little suspicious that relu would do that. It also seems like a wacky test, as we're using a bunch of different optimizers simultaneously...

@sgugger
Copy link
Contributor

sgugger commented Feb 20, 2020

I agree that this test is wacky: there is no way to compute what the theoretical output should be. Testing each optimizer separately on one iteration sounds like something more contained and useful: if this test fails, you have no idea which optimizer step is responsible so it doesn't help debugging faulty code.

@t-ae
Copy link
Contributor Author

t-ae commented Feb 21, 2020

I looked into the problem and found that the codes below is not working on Linux. (Results of recursivelyAllWritableKeyPaths are empty.)

for kp in infinityNorm.recursivelyAllWritableKeyPaths(to: Tensor<Float>.self) {
infinityNorm[keyPath: kp] = max(
beta2 * infinityNorm[keyPath: kp], abs(direction[keyPath: kp]))
}
for kp in infinityNorm.recursivelyAllWritableKeyPaths(to: Tensor<Double>.self) {
infinityNorm[keyPath: kp] = max(
Double(beta2) * infinityNorm[keyPath: kp], abs(direction[keyPath: kp]))
}

infinityNorm remains all 0 and following denominator becomes epsilon(=1e-8).
Finally it moves along (firstMoments ./ denominator).scaled(by: -stepSize) which is unusually large.

This doesn't happen on macOS.

@saeta
Copy link
Contributor

saeta commented Feb 25, 2020

@t-ae I just wanted to say, thank you very much for chasing down this issue to the keypathiterable issue! This is fantastic. @marcrasi will be looking into the underlying issue later this week.

@t-ae t-ae mentioned this pull request Feb 26, 2020
@marcrasi
Copy link
Contributor

Since everyone seems to agree that this test is a bit hard to maintain debug, I propose that we delete it instead of fixing and reenabling it. Since @t-ae tested all the optimizers in #698 (thanks!!), the only currently-not-covered thing in this test is sequenced. Therefore, I'll prepare a PR that adds tests for sequenced and deletes this test.

@marcrasi marcrasi closed this Feb 28, 2020
@t-ae t-ae deleted the fix-sequential-tests-failure branch February 28, 2020 00:17
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants