-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider removing steepness
parameter of softplus
#645
Comments
Does negative |
I think it's worth taking a step back and asking whether As mentioned above, TF and ONNX only support a more basic variant of How important is |
The expected results of division by zero for floating point values is:
(unlike division by zero for integers, there's nothing ambiguous in the IEEE standard for floating point)
It does now 😉. Coincidentally we relaxed DirectML's SOFTPLUS validation in February to permit < 1 (but that version is not out yet, and the docs are still valid for DML 1.13) which includes negative values, and even 0 for the sake of PyTorch and potentially WebNN.
🤔 I don't know the use case, but like you say, the graph is smooth, and PyTorch supports it without complaint: import torch
s = torch.nn.Softplus(beta=-1.0)
x = torch.tensor([0.5930860043, 0.9014285803, -0.6331304312, 0.4639878273], dtype=torch.float32)
y = s(x)
print("value:", y)
print("shape:", y.shape)
print("dtype:", y.dtype)
# value: tensor([-0.4399, -0.3407, -1.0590, -0.4878])
# shape: torch.Size([4])
# dtype: torch.float32 Other libraries would need to (assuming the parameter was kept) support it via decomposition, in which case the same question would arise anyway, just that the question occurs in the |
Considerations I see include:
Semantics: I would be interested to know where this parameter came from in the first place, like maybe a paper that introduced it, and why PyTorch has it. Front end complexity: Currently the biggest known front-end is ORT WebNN EP graph builder which just passes the default steepness (=1) to WebNN. Now, some small performance could be gained if the builder looked before and after by one operator for a Backend complexity and WPT complexity: If only one front-end caller (PyTorch) supports it and only one backend (DML) supports it, then keeping it is more dubious. Removing steepness simplifies WPT and conformance some. Usage: Scanning 700 models I have locally, I see very few that even use *of course my little hard drive collection doesn't represent the full world of ML 🌍, but a 🍰 of it. Performance: Since GPU's are primarily memory bound for very simple math operations, then having 2 extra intermediates tensors to write out/read back reduces perf by 3x for the pattern mul&softplus&div. Precision: For float16 tensors, computing float32 intermediate values (for this part weirdly I feel like we already discussed this before, but I can't find the issue 🤷 Separate issue for |
@fdwr , thanks for your nice summary. I think we can just retitle this post. And removing |
steepness
parameter of softplus
:) I think removing steepness is fine. From a API design perspective, adding it later is much easier than deprecation (if we find steepness isn't a good fit down the line). |
It's raised by @a-sully in CL review, thanks!
Softplus calculates
ln(1 + exp(steepness * x)) / steepness
, when the steepness is 0 it might result in division by zero.I tried Pytorch
torch.nn.Softplus(beta=0)
and the results are all inf, and TF and ONNX don't have this attribute. Also DirectML doesn't supportsteepness < 1.0
.The text was updated successfully, but these errors were encountered: