Fix optimizer parameter buffer allocation size. #2855

helinwang · 2017-07-13T19:50:28Z

The buffer allocation size should be number of bytes, not number of
floats.

Thanks @reyoung for noticing the bug!

Fixes: #2854
Fixes: #2858

The buffer allocation size should be number of bytes, not number of floats.

…param

typhoonzero · 2017-07-14T01:31:40Z

go/pserver/optimizer.go

+		return fmt.Errorf("Name: %s, parameter and gradient does not have same content len, parameter: %d, gradient: %d", g.Name, o.contentLen, len(g.Content))
+	}
+
+	r := C.paddle_update_parameter(o.opt, C.paddle_element_type(g.ElementType), unsafe.Pointer(&g.Content[0]), C.int(len(g.Content)))


In optimizer/optimizer.cc, Content is casted to float array, and length(num_bytes) is passed to Tensor.h

int paddle_update_parameter(paddle_optimizer* o, const paddle_element_type data_type, const void* grad_buffer, int num_bytes) { auto grad_type = reinterpret_cast<const float*>(grad_buffer); Tensor* gradient = new Tensor(const_cast<float*>(grad_type), num_bytes);

in optimizer/Tensor.h:

TensorT(T* data, size_t size) : height_(1), width_(size), data_ptr_(nullptr), data_(data) {}

The num_bytes is used as width, which means count of float numbers. So this is not bytes.

Maybe we should change the parameter name of paddle_update_parameter num_bytes to array_size

@typhoonzero helin move the size changing to tensor side. may be it is more clear put this near type casting.

num_bytes is for const void* grad_buffer which means the length of the byte buffer. Then paddle_update_parameter need to calculate the array_size based of the type of the element. That's why I changed from new TensorT(buffer, num_bytes) to new Tensor(buffer, num_bytes/sizeof(float)) here
I think it's paddle_update_parameter's responsibility to determine the meaning and element size of the buffer, not the caller.
Would like to know how do you guys think? @typhoonzero @dzhwinter

Sorry, I didn't see the change below.

typhoonzero · 2017-07-14T02:53:45Z

go/pserver/optimizer.go

 	p := paramWithConfigs.Param
 	c := paramWithConfigs.Config
 	s := State
-	paramBufferSize := C.size_t(len(p.Content) / C.sizeof_float)
+	paramBufferSize := C.size_t(len(p.Content))


Well, malloc size is indeed must be the length of p.Content the type is unsigned char* means a byte buffer.

typhoonzero · 2017-07-14T03:13:26Z

Related: #2859

dzhwinter · 2017-07-14T03:06:01Z

go/pserver/optimizer.go

+		return fmt.Errorf("Name: %s, parameter and gradient does not have same content len, parameter: %d, gradient: %d", g.Name, o.contentLen, len(g.Content))
+	}
+
+	r := C.paddle_update_parameter(o.opt, C.paddle_element_type(g.ElementType), unsafe.Pointer(&g.Content[0]), C.int(len(g.Content)))


@typhoonzero helin move the size changing to tensor side. may be it is more clear put this near type casting.

dzhwinter · 2017-07-14T03:17:22Z

go/pserver/client/client_test.go

+		if i == numGroups-1 {
+			gs = grads[i*paramPerGroup:]
+		} else {
+			gs = grads[i*paramPerGroup : (i+1)*paramPerGroup]


just curious, if we send grads with the size of 0, 10000, 0, 10000 like a sawtooth. compare with group similar size together, like 0, 0 ,0, 10000, 10000. Is it hurt the performance?

Great question, this is for testing concurrent send, I think the system performs best when the traffic for each parameter server is similar size.
The current implementation will send one entire parameter (no matter how big it is) to one parameter server. In the near future (a TODO item) we will make the optimization to partition each big parameter into smaller chunks and shard them to different parameter servers. In this way, the traffic for each parameter server is similar size.

typhoonzero · 2017-07-14T03:20:47Z

go/pserver/optimizer.go

+		return fmt.Errorf("Name: %s, parameter and gradient does not have same content len, parameter: %d, gradient: %d", g.Name, o.contentLen, len(g.Content))
+	}
+
+	r := C.paddle_update_parameter(o.opt, C.paddle_element_type(g.ElementType), unsafe.Pointer(&g.Content[0]), C.int(len(g.Content)))


Sorry, I didn't see the change below.

typhoonzero · 2017-07-14T03:22:09Z

go/pserver/client/client_test.go

+			p.Name = "p_" + strconv.Itoa(i)
+			p.ElementType = pserver.Float32
+			p.Content = make([]byte, (i+1)*100)
+			err := c.InitParam(pserver.ParameterWithConfig{Param: p, Config: config})


Do we need multiple client instances to simulate multiple client to init parameters at the same time?

Great idea! I will send a follow up PR for this one.

typhoonzero

LGTM!

dzhwinter

LGTM++

Fix optimizer parameter buffer allocation size.

11660ea

The buffer allocation size should be number of bytes, not number of floats.

helinwang requested a review from dzhwinter July 13, 2017 19:50

Client test: concurrently init param. Concurrently send grad and get …

777a5cc

…param

typhoonzero requested changes Jul 14, 2017

View reviewed changes

typhoonzero reviewed Jul 14, 2017

View reviewed changes

dzhwinter reviewed Jul 14, 2017

View reviewed changes

dzhwinter mentioned this pull request Jul 14, 2017

"fix pserver malloc size" #2859

Closed

typhoonzero reviewed Jul 14, 2017

View reviewed changes

typhoonzero approved these changes Jul 14, 2017

View reviewed changes

dzhwinter approved these changes Jul 14, 2017

View reviewed changes

helinwang merged commit f49fda5 into PaddlePaddle:develop Jul 14, 2017

helinwang deleted the issue/2854 branch July 14, 2017 03:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix optimizer parameter buffer allocation size. #2855

Fix optimizer parameter buffer allocation size. #2855

helinwang commented Jul 13, 2017 •

edited by typhoonzero

Loading

typhoonzero Jul 14, 2017

dzhwinter Jul 14, 2017

helinwang Jul 14, 2017 •

edited

Loading

typhoonzero Jul 14, 2017

typhoonzero Jul 14, 2017

typhoonzero commented Jul 14, 2017

dzhwinter Jul 14, 2017

dzhwinter Jul 14, 2017

helinwang Jul 14, 2017 •

edited

Loading

typhoonzero Jul 14, 2017

typhoonzero Jul 14, 2017

helinwang Jul 14, 2017

typhoonzero left a comment

dzhwinter left a comment

Fix optimizer parameter buffer allocation size. #2855

Fix optimizer parameter buffer allocation size. #2855

Conversation

helinwang commented Jul 13, 2017 • edited by typhoonzero Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

helinwang Jul 14, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

typhoonzero commented Jul 14, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

helinwang Jul 14, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

typhoonzero left a comment

Choose a reason for hiding this comment

dzhwinter left a comment

Choose a reason for hiding this comment

helinwang commented Jul 13, 2017 •

edited by typhoonzero

Loading

helinwang Jul 14, 2017 •

edited

Loading

helinwang Jul 14, 2017 •

edited

Loading