-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mask #5
Comments
I think sharing notebooks is always a good idea, so please share it. Would love to check it. |
What exactly was the issue with the mask? It looks like you’re taking mask.log()- won’t that return NaN where mask is 0? |
It returns I think that the main problem was with numerical instabilities when I forced the probabilities to |
What was the problem with the masking that you fixed? |
@pemami4911 Sorry, I've updated the answer above. |
Certainly, there must be a more elegant way of doing this :) |
Can you tell me what the hyperparameters you are using are? Are you using the exponential moving average critic? Did you try it on TSP_20? |
I added the |
@ricgama After 1 epoch (10,000 minibatches of size 128), on my held-out validation set of 1000 graphs I got:
I saw some tours as low as 2.4! It's learning! THANK YOU! haha |
After 1 epoch of TSP 20, I got:
With some tours as low as 3.6. According to the paper, I should be aiming for an average reward of 3.89. Sweet. |
Great!! For now, I'm just using a simple version with I'm posting here my test sets for n=10 and n=20 for best results comparison.
where the labels are the optimal labels. For n=20:
For n=10 I obtain: I will try to post the RL results, for n=10 and 20, by the end of the week. |
Cool, I'll update my repo with some results by the end of the week too. For TSP 20 RL, the validation average reward is down to Can't believe it was just a |
After 50 epochs for TSP 20, I got After two epochs on TSP 50, I'm seeing:
not bad! |
Do you have the training history plots? |
I haven't made any nice plots yet, but these are quick screenshots from Tensorboard for TSP 50. Been running for ~21 hours, looks like the validation avg reward has just about reached 6.05-6.10. Zoomed in on average training reward (stochastic decoding) first few hours of trainingaverage training reward (stochastic decoding) so far Validation reward (greedy decoding). The plot shows each reward (length of tour) for the set of 1000 val graphs- after every epoch (10,000 random training graphs), I evaluate by shuffling the 1000 held-out graphs and running on each one of them. So, this isn't showing an "average" - the average is just computed at the end of running over all 1000 graphs each time I do a validation pass |
After 2 epochs for TSP 20, I got
worse than your: |
Is this SL or RL? And is your train reward with greedy decoding, or stochastic decoding? I am using the lr schedule from the paper - starting at 1e-3 and every 5k steps decrease by a factor of 0.96. I'm using the exponential moving average critic, not the critic network. |
It's RL and stochastic decoding. With one attention glimpse it appears a bit better so I will train 2 epochs and do greedy decoding and beam search to compare. My hardware is slower than yours so I want to check n=20 before moving to n=50... |
OK- yeah you'll want to compare with greedy decoding, not stochastic. just FYI the beam search in my codebase isn't functional yet- it's only coded to propagate a single decoding hypothesis forward at each time step, which is equivalent to greedy decoding. Fixing that is on my to-do list :D |
I've implemented my beam search. It works very well but for now only for batch=1, so it's a bit slower. |
Meanwhile, I think I will implement the Active Search of the paper. Have you looked into it? |
Yeah you can send it to me if you'd like! I haven't looked into implementing that yet. Not sure when I'll get to that part, got some other things I'm working on in the mean time |
@pemami4911 When I was working on my BS to handle RL trained Pointer Model I found some inconsistencies that I have to look into before I share the code. After 2 epochs |
Hello @pemami4911, During my n=50 training it appeared While you were training for |
Yes - You can see in my stochastic decoding function that I check if any cities were sampled that shouldn't have been- and if so, I resample all cities at that decode step. If that occurs (I think it's a race condition..?) I print out |
So there must be a bug with the |
Were you able to replicate the "bad sampling" with your script? |
yes. It's just:
By the end of the day I will have an estimate. |
The relative frequency on a run of 100000 batches, size 128 and n=50, was Do you mind that I post a question on the Pytorch forum, using the code above? |
Yeah, go for it |
Done: https://discuss.pytorch.org/t/bad-behavior-of-multinomial-function/10232 |
Hello @pemami4911 , I had a reply: https://discuss.pytorch.org/t/bad-behavior-of-multinomial-function/10232/2 |
Yeah I'll try running it with GPU and with CPU |
Sorry for the late reply.. been busy I ran it for the CPU and received 0 errors. The script is still running on my GPU (it runs muuuch slower, maybe we should switch to CPU for computing this portion of the code in our implementations..!) and I've already observed multiple resamplings. Most likely, it is a bug in the low-level data transfer occurring between the CPU and GPU. I imagine they are using multi-threading to accomplish this. We should test this with the new torch.distributions.Categorical in 0.3 as well and then report back. |
Hello, In fact on cpu is much faster then gpu: I've changed the code to Pytorch 0.3 and I will run it for 100000 batches during the next days. You can find the notebooks here: https://www.dropbox.com/s/6wxwllae643e673/sampling.zip?dl=0 |
Hello @pemami4911,
The problem really was with the mask. I've fixed it and the network started to learn. My Decoder now is:
For n=10, I'm obtaining the following during training the AC version:
I've checked and the returned solutions are all feasible so it seems that it is really converging.
I will clean my code and hope that by the end of the week I will have some training history plots and test set validation.
If you like, I can share my notebook.
Best regards
The text was updated successfully, but these errors were encountered: