-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-surface function utilities only work for contiguous input data #218
Comments
Can you explain why "x == y" for |
Thank you for your reply. To clarify, while running the example code |
Can you set |
Thank you very much for your prompt reply. I set the gate_noise to 0 according to what you said, but the result is still the same as before, and it is still different (when expert equals 1, the softmax result equals 1, and the gate modification may not affect the final result at this time?) |
OK, can you help to provides these things? For both solutions, please add the following codes after In example code: ...
torch.save([x, crit, y], 'test_cast_example.py') In model code: ...
torch.save([x, crit, y], 'test_cast_model.py') It'll help us to reproduce and look into what happens for your cases. BTW, I assume you use default setting of |
Thank you again for your reply. I saved the results according to your prompt, please see the attachment. As you said, self.is_postscore always equals True. In addition, I would also like to ask about the function of the self.is_postscore. |
Hi, current Thanks for this finding, since you directly use internal function utilities. You can create a PR and turn it into contiguous inside this function to always guarantee the assumption. |
I made the changes you mentioned and the problem was solved perfectly. Also, I'd like to ask if I want to ignore the score at this point in the top-k=1 case, i.e. y=expert1(x)+expert2(x)+.... +expertn(x) instead of y=score1expert1(x)+score2expert2(x)+.... +scoren*expertn(x) , how should I set it? |
For now, score tensor applies to either one of x and y, which is specified by is_postpone. Do you want to always not using score tensor? If so, the gating section becomes useless. To force doing that, please: |
I hope to use scores to determine which expert to work with, i.e. y=expert_n(x), n=softmax(score1, score2....), but I want to ignore the scores i.e. y=expert_n(x) instead of y=score_n*expert_n(x). Is this possible? |
For your purpose, I think you need to delete |
Thank you very much, the problem has been solved perfectly~~ |
According to the paper, when the 'expert' value is set to 1, the score (scores = F.softmax(logits_w_noise, dim=1)) should always equal 1. Consequently, the output variable "y" (y = fast_encode(x.to(logits_dtype), crit, self.is_postscore).to(x.dtype), in the 'moe_layer.py' file on line 304) should be equal to the input variable "x". However, in my experiment, the "x" and "y" values are sometimes found to be different. This difference is first shown in "ctx.config.func_fwd(g, i, l, reshaped_input, dispatched_input, extra=[ctx.config.indices_[0].size(0), ctx.config.aligned_dim, ctx.config.capacity]) in fast_dispatch.py,line28" and the root source is "tutel_custom_kernel.invoke(inputs, extra, blocks, ctx) in jit_compiler.py line33". How can I fix this problem?
The text was updated successfully, but these errors were encountered: