You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The homogeneity of RHNs ease us to learn sparse structures within RHNs. In our recent work of ISS (https://arxiv.org/pdf/1709.05027.pdf), we find that the we can reduce "#Units/Layer" of "Variational RHN + WT" in your Table 1 from 830 to 517 without losing perplexity. This reduces the model size from 23.5M to 11.1M, which is much smaller than the model found by "Neural Architecture Search". For your interests, the results are covered in Table 2 in our work.
Let us know if this is interesting to you.
The text was updated successfully, but these errors were encountered:
@julian121266 That is a good point. Current finding is it can slightly improve the performance to 67.5/65.0 using a smaller size of 726 as shown in table 2. Let me check if we can improve more starting from a larger model and compressing them. BTW, did you try model size beyond 830 for your RHNs with depth 10? If it didn't improve performance, was it because it's more difficult to optimize?
@wenwei202 We had similar findings. Optimization was fine. The model simply did not generalize much better. In fact depth 8 ended up working slightly better than depth 10. Most likely the relationship is submodular with diminishing returns for increased depth.
A new iteration on the RHN idea was actually published half a year later: https://arxiv.org/abs/1705.08639
The homogeneity of RHNs ease us to learn sparse structures within RHNs. In our recent work of ISS (https://arxiv.org/pdf/1709.05027.pdf), we find that the we can reduce "#Units/Layer" of "Variational RHN + WT" in your Table 1 from 830 to 517 without losing perplexity. This reduces the model size from 23.5M to 11.1M, which is much smaller than the model found by "Neural Architecture Search". For your interests, the results are covered in Table 2 in our work.
Let us know if this is interesting to you.
The text was updated successfully, but these errors were encountered: