Fix invalid mode changes during tests #2511

flx42 · 2015-05-26T19:47:55Z

Some existing tests are modifying the Caffe mode halfway through the execution, this is documented to be invalid:
https://github.com/BVLC/caffe/blob/8df472/include/caffe/common.hpp#L140-L143

If, for performance reasons, host memory is allocated through cudaMallocHost, changing the mode halfway can cause a pointer returned by cudaMallocHost to be freed by free(2), resulting in undefined behavior. The reciprocal is also possible. Another possible issue is that if some tests incorrectly assume that the default mode is CPU, the test could actually run on the GPU if the previous test clobbered the global mode. See the full analysis of this issue in #2398

The solution is, IMHO, to forbid calls to Caffe::set_mode() in individual test cases, this function should only be called by the test framework in order to limit the risks of a misuse. To achieve this, the following patch set reuses the existing MultiDeviceTest class and similarly add new classes GPUDeviceTest and CPUDeviceTest. In the case where we need to share code between CPU and GPU tests, the shared test code can directly derive from class MultiDeviceTest but derived classes needs to be defined for CPU and GPU.

Similarly, FloatGPU and DoubleGPU are replaced by a new type GPUDevice<T>.

…nctionsTest

…Test and GPUStochasticPoolingLayerTest

These new classes can be used to implement test cases that are only running on the GPU or the CPU. The goal is to move all calls to Caffe::set_mode() inside the test framework, to discourage any test to change the mode halfway through the execution, which is documented to be illegal.

shelhamer · 2015-05-30T01:03:16Z

This looks good to me. Independent of deciding what's right for mode and device in general, templated tests seem like a more robust approach to making sure the mode is right.

Exactly what to do with mode and device is an ongoing conversation, but I think diffusing mode + device to Nets, Layers, and Solvers and making it immutable is reasonable. The dismantling of the singleton / diffusing of the handles is on the charts as #1500 (at least for Net).

I don't know that we ever fully converged on this however. @longjon @jeffdonahue comment if you remember any threads.

jeffdonahue · 2015-05-30T01:19:47Z

LGTM too. Not sure what we'll end up doing with mode but this looks like it could only make any transition away from what we have now smoother (by centralizing/decreasing the number of references to mode in the code base).

Fix invalid mode changes during tests

flx42 added 2 commits May 26, 2015 12:17

Refactor types FloatCPU and DoubleCPU into a new type CPUDevice<T>

25538ce

Similarly, FloatGPU and DoubleGPU are replaced by a new type GPUDevice<T>.

Split class MathFunctionsTest into CPUMathFunctionsTest and GPUMathFu…

2cd27fd

…nctionsTest

flx42 force-pushed the fix_illegal_mode_changes branch from 338447f to 6cedd62 Compare May 26, 2015 19:50

flx42 added 11 commits May 26, 2015 13:55

Split class StochasticPoolingLayerTest into CPUStochasticPoolingLayer…

43d538f

…Test and GPUStochasticPoolingLayerTest

Make class Im2colKernelTest derive from GPUDeviceTest

8a5abbf

Make class AccuracyLayerTest derive from CPUDeviceTest

8437c64

Make class CuDNNNeuronLayerTest derive from GPUDeviceTest

f48cead

Make class ArgMaxLayerTest derive from CPUDeviceTest

5ca280a

Make class CuDNNConvolutionLayerTest derive from GPUDeviceTest

4feaa0e

Make class CuDNNPoolingLayerTest derive from GPUDeviceTest

307c4b6

Make class DummyDataLayerTest derive from CPUDeviceTest

58f9ea3

Make class CuDNNSoftmaxLayerTest derive from GPUDeviceTest

89bf3c3

Make class MultinomialLogisticLossLayerTest derive from CPUDeviceTest

68133e7

flx42 force-pushed the fix_illegal_mode_changes branch from 6cedd62 to 68133e7 Compare May 26, 2015 20:56

shelhamer added the testing label May 30, 2015

shelhamer added a commit that referenced this pull request May 30, 2015

Merge pull request #2511 from flx42/fix_illegal_mode_changes

3cc9bac

Fix invalid mode changes during tests

shelhamer merged commit 3cc9bac into BVLC:master May 30, 2015

flx42 deleted the fix_illegal_mode_changes branch June 8, 2015 21:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix invalid mode changes during tests #2511

Fix invalid mode changes during tests #2511

flx42 commented May 26, 2015

shelhamer commented May 30, 2015

jeffdonahue commented May 30, 2015

Fix invalid mode changes during tests #2511

Fix invalid mode changes during tests #2511

Conversation

flx42 commented May 26, 2015

shelhamer commented May 30, 2015

jeffdonahue commented May 30, 2015