Train time? #4

eturner303 · 2016-12-07T17:27:16Z

Curious what sort of train times you're seeing with this implementation.

I'm using a GRID K520 GPU (Amazon g2.2xlarge) -- i'm seeing each Epoch take around 1200 seconds, which seems wrong.

From the original paper:

"Data requirements and speed We note that decent results
can often be obtained even on small datasets. Our facade
training set consists of just 400 images (see results in
Figure 12), and the day to night training set consists of only
91 unique webcams (see results in Figure 13). On datasets
of this size, training can be very fast: for example, the results
shown in Figure 12 took less than two hours of training
on a single Pascal Titan X GPU."

Granted I'm not using a Pascal GPU -- which as 2496 CUDA cores, but the g2.2xlarge has around 1500 CUDA cores. At the current rate 200 epochs would take 3 days, as opposed to the 2 hours quoted in the original paper.

Are you seeing similar train times when running this code? Wondering why there is such a discrepancy compared to the original paper/Torch implementation

yenchenlin · 2016-12-07T17:32:03Z

I am investigating this issue.
It took me around 10 hours to run 200 epochs on a Pascal GPU.

There are mainly three reasons in my opinion:

In this implementation (inherited from DCGAN-tensorflow), generator needs to update twice in each iteration, which slows down the training process a lot.
Since the project is inherited from DCGAN-tensorflow, it uses fully connected layer in the discriminator.
The data preprocessing step is currently performed on the fly during training, which may can be enhanced.

eyaler · 2016-12-15T08:03:03Z

training facades took 10 hours on GTX1080 ~ 180 sec per epoch

kaihuchen · 2016-12-30T06:56:59Z

My test with GRID K520 GPU (Amazon g2.2xlarge) using my own dataset shows that pix2pix/Torch runs about 30 times faster than pix2pix/Tensorflow version. Monitoring using 'watch nvidia-smi' shows that the Tensorflow version is not using the GPU at all.

eyaler · 2016-12-30T07:47:00Z

@kaihuchen sorry for the obvious question, but did you install "tensorflow-gpu"?

yenchenlin · 2016-12-30T07:52:56Z

@kaihuchen I'm sure that I'm training this code with GPU. Can you tell me how you installed tensorflow?
sidenote: 看來您是畢業自台灣清華大學的學長 😄

@eyaler I've updated the codebase alot recently (which gain speed comparable to torch version, will upload later)

kaihuchen · 2016-12-31T03:28:18Z

@yenchenlin My bad! I have many servers and it would seem that I did the test on a server with the CPU version of the tensorflow, and not the GPU one.

ppwwyyxx · 2016-12-31T06:17:41Z

@eyaler I also had a tensorflow implementation here. It takes me 43 seconds every epoch (400 iterations of batch=1 on facades dataset) on GTX1080, while the torch version takes 42 seconds.

yenchenlin · 2016-12-31T06:22:09Z

Thanks @ppwwyyxx for the info!

@eyaler I think currently the code mentioned above works better!
However, I'll still update code here in these 3 days.

Skylion007 · 2017-01-13T18:22:01Z

@yenchenlin Any update on this? I do not see any recent commits pertaining to speed. Otherwise, I am tempted be forced to use the code provided by @ppwwyyxx. I have tested the Tensorpack implementation and it 4-5X faster and uses approximately 1/3 the memory of this implementation.

Neltherion · 2017-01-13T19:54:31Z

The code looks clean and straight forward... I really can't get my head around the reason why it's slow... It's pretty much a standard GAN so why is it so slow?! answer to this question has become one of the reasons I check this thread every now and then...

Skylion007 · 2017-01-13T20:48:52Z

I have one idea.

Feed_dicts are incredibly awefully slows. We should do what Tensorpack does and load say 50 images at a time, keep them in a queue of numpy arrays and then feed them in with a queue runner. This alone might be responsible for the speed difference since it doubles the number of copies needed and causes a lot of expensive switching between Python and Tensorflow C code.

Reference to issue from Tensorflow: tensorflow/tensorflow#2919

Neltherion · 2017-01-13T20:52:45Z

@Skylion007
hmmm... How about the fact that this network is using a fully connected layer in the discriminator... last I checked Tensorpack uses a 1x1 Convolution in the last layer (instead of a fully connected layer)... couldn't it be because of this?

Skylion007 · 2017-01-13T20:57:39Z

That's another issue, there was a pull request to address this, but it was rejected because it made the edges sightly more blurry. I'm open to try to that and see if it improves the speed. You want to try experimenting with that pull request and see if it yields any results? My GPU is currently in use by another experiment.

Neltherion · 2017-01-13T21:00:48Z

My GPU is currently in use by another experiment.

That's exactly my case too! I've been running one for 3 days and last night it started showing acceptable improvements, I really don't want to stop it for at least 3 more days...

Skylion007 · 2017-01-14T20:18:48Z

The graphs for each network look very different as well. @ppwwyyxx implementation's graph looks like this for instance while the network in this repo seems to have alot of dependencies so much so that the graph looks more like a straight line than tree. A very different appearance from the one below:

Not entirely sure how much of that is due to good Tensorboard formatting and how much of that is a fundamental differences in the architecture between the networks.

ppwwyyxx · 2017-01-15T05:55:17Z

@Skylion007 Tensorboard tends to organize ops under the same name scope together, so what you see in the above figure isn't the real architecture but more about summaries and utilities. You can open the "gen" and "discrim" block in the above figure, and they will contain the model architecture for generator and discriminator.

Skylion007 · 2017-01-15T18:47:38Z

Yeah, I see that now. I am just so confused why the other code is so much faster. I just discovered Tensorboard so I was trying to see what I could gain from it. I will say that the GPU memory use is much higher in this implementation. I am really curious why that would be the case. That could explain why it's slower maybe. Any ideas @ppwwyyxx ? Any special tricks your code is doing?

Neltherion · 2017-01-15T19:40:41Z

@Skylion007 it's probably the Fully Connected Layer... Those things takes a lot of memory...

eyaler · 2017-01-17T16:32:22Z

changing the last layer from fully connected to a convolution as in the original pix2pix implementation did not give me any speedup
i think we should not run the G optim twice. it is against common wisdom to try to balance D and G by hand, and even some suggest do train D twice and G once.
preproc alone can take up to ~50% of epoch time (in a specific case i had) - should be done only once before train.
i tried holding all facade train images in memory (instead of loading preprocessed versions from ssd disk) - this did not help (this way is not scalable but could be done in chunks)
not evaluating losses after each batch - i assume there is a better way to get this from the train run()?

with (2) and (3) i could bring the epoch time down from 180s to 110s (facades on GTX1080).
also doing (5) brought it down to 85s. still a factor of 2 too slow.

yenchenlin · 2017-01-18T01:09:12Z

Thanks @eyaler , it's 2. and 3 IMO,
and 2 is a crucial point.

I'm really sorry that I'm dealing with some other annoying stuffs recently 😭

Neltherion · 2017-01-18T02:44:43Z

@eyaler This was an eye opener... I had so many misconceptions about performance in this project! Thanks for the time... please keep going on!

Neltherion · 2017-01-21T12:51:20Z

can anyone tell me why we do this:

        self.fake_AB = tf.concat(3, [self.real_A, self.fake_B])
        self.D_, self.D_logits_ = self.discriminator(self.fake_AB, reuse=True)

why do we concat real_A and fake_B and give them BOTH to the discriminator while what we want is to give it just one image (the generated fake: self.fake_B) ?

doesn't this force the discriminator to accepts dual images (one half the real image and the other half the generated one) and double the time needed to process them?

yenchenlin · 2017-01-21T12:57:43Z

Hello @Neltherion, please see the image from paper:

Neltherion · 2017-01-21T13:06:06Z

Hmm... You're right... and just giving the fake images to the Discriminator is probably not enough... my bad! thanks for the quick reply...

yenchenlin · 2017-01-21T13:11:07Z

Normally, conditional GAN will send the conditional data (e.g., class, attribute, text, image) together with the synthesized image to the discriminator. See this paper for a more complicated discriminator.

eyaler · 2017-02-23T17:04:25Z

some benchmarks for the community:

image_iterations/sec:
5.2 phillipi K80/torch/cuda8
1.1 yenchenlin K80/tf0.12.1/cuda7.5
1.2 yenchenlin K80/tf0.12.1/cuda8
1.2 yenchenlin K80/tf1.0/cuda8
2.2 yenchenlin 1080/tf0.12.0/cuda8
2.3 yenchenlin_mod K80/tf0.12.1/cuda7.5
2.5 yenchenlin_mod K80/tf0.12.1/cuda8
2.5 yenchenlin_mod K80/tf1.0/cuda8
4.7 yenchenlin_mod 1080/tf0.12.0/cuda8
4.7 affinelayer K80/tf1.0/cuda8
5.5 tensorpack K80/tf1.0/cuda8

so seems that tensorpack is the fastest, and that 1080 is twice as fast as K80

all experiments are on the facades dataset and use cudnn 5.1

phillipi = https://github.com/phillipi/pix2pix
yenchenlin = https://github.com/yenchenlin/pix2pix-tensorflow
yenchenlin_mod = #4 (comment)
tensorpack = https://github.com/ppwwyyxx/tensorpack
affinelayer = https://github.com/affinelayer/pix2pix-tensorflow

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train time? #4

Train time? #4

eturner303 commented Dec 7, 2016

yenchenlin commented Dec 7, 2016 •

edited

Loading

eyaler commented Dec 15, 2016 •

edited

Loading

kaihuchen commented Dec 30, 2016

eyaler commented Dec 30, 2016

yenchenlin commented Dec 30, 2016

kaihuchen commented Dec 31, 2016

ppwwyyxx commented Dec 31, 2016

yenchenlin commented Dec 31, 2016

Skylion007 commented Jan 13, 2017 •

edited

Loading

Neltherion commented Jan 13, 2017

Skylion007 commented Jan 13, 2017 •

edited

Loading

Neltherion commented Jan 13, 2017 •

edited

Loading

Skylion007 commented Jan 13, 2017 •

edited

Loading

Neltherion commented Jan 13, 2017 •

edited

Loading

Skylion007 commented Jan 14, 2017

ppwwyyxx commented Jan 15, 2017 •

edited

Loading

Skylion007 commented Jan 15, 2017

Neltherion commented Jan 15, 2017

eyaler commented Jan 17, 2017 •

edited

Loading

yenchenlin commented Jan 18, 2017 •

edited

Loading

Neltherion commented Jan 18, 2017

Neltherion commented Jan 21, 2017 •

edited

Loading

yenchenlin commented Jan 21, 2017

Neltherion commented Jan 21, 2017 •

edited

Loading

yenchenlin commented Jan 21, 2017

eyaler commented Feb 23, 2017 •

edited

Loading

Train time? #4

Train time? #4

Comments

eturner303 commented Dec 7, 2016

yenchenlin commented Dec 7, 2016 • edited Loading

eyaler commented Dec 15, 2016 • edited Loading

kaihuchen commented Dec 30, 2016

eyaler commented Dec 30, 2016

yenchenlin commented Dec 30, 2016

kaihuchen commented Dec 31, 2016

ppwwyyxx commented Dec 31, 2016

yenchenlin commented Dec 31, 2016

Skylion007 commented Jan 13, 2017 • edited Loading

Neltherion commented Jan 13, 2017

Skylion007 commented Jan 13, 2017 • edited Loading

Neltherion commented Jan 13, 2017 • edited Loading

Skylion007 commented Jan 13, 2017 • edited Loading

Neltherion commented Jan 13, 2017 • edited Loading

Skylion007 commented Jan 14, 2017

ppwwyyxx commented Jan 15, 2017 • edited Loading

Skylion007 commented Jan 15, 2017

Neltherion commented Jan 15, 2017

eyaler commented Jan 17, 2017 • edited Loading

yenchenlin commented Jan 18, 2017 • edited Loading

Neltherion commented Jan 18, 2017

Neltherion commented Jan 21, 2017 • edited Loading

yenchenlin commented Jan 21, 2017

Neltherion commented Jan 21, 2017 • edited Loading

yenchenlin commented Jan 21, 2017

eyaler commented Feb 23, 2017 • edited Loading

yenchenlin commented Dec 7, 2016 •

edited

Loading

eyaler commented Dec 15, 2016 •

edited

Loading

Skylion007 commented Jan 13, 2017 •

edited

Loading

Skylion007 commented Jan 13, 2017 •

edited

Loading

Neltherion commented Jan 13, 2017 •

edited

Loading

Skylion007 commented Jan 13, 2017 •

edited

Loading

Neltherion commented Jan 13, 2017 •

edited

Loading

ppwwyyxx commented Jan 15, 2017 •

edited

Loading

eyaler commented Jan 17, 2017 •

edited

Loading

yenchenlin commented Jan 18, 2017 •

edited

Loading

Neltherion commented Jan 21, 2017 •

edited

Loading

Neltherion commented Jan 21, 2017 •

edited

Loading

eyaler commented Feb 23, 2017 •

edited

Loading