Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tensorflow image loader image would cause different result #2

Open
zlin3000 opened this issue Jun 30, 2017 · 19 comments
Open

Tensorflow image loader image would cause different result #2

zlin3000 opened this issue Jun 30, 2017 · 19 comments

Comments

@zlin3000
Copy link

zlin3000 commented Jun 30, 2017

I randomly tested several images, the difference is between .10 to .20.

In fact, I tested the code one by one, and found the resize method might be the problem which cause this.

I also used opencv instead of PIL to do resize, the final result is similar to tensorflow resize. Moreover, I compared resize result between PIL and opencv, they are quite different, for example, the max difference value in one image is about 25, and the RMSD is about 3.

Last, I read some articles which point out that adding noise to a image might cause totally different result even though human being cannot find the difference between these two images.

PS: thanks to this repository which helps me to save time, otherwise I might need to spend lots of time to convert caffe to tensorflow. :)

@delta9
Copy link

delta9 commented Jun 30, 2017

I've nothing to add, but I've seen similar results during testing which has lead me to use the original classifier with caffe.

I haven't had the time to look through the code more thoroughly but I'd suggest comparing the resize logic:

https://github.com/yahoo/open_nsfw/blob/master/classify_nsfw.py#L19
https://github.com/mdietrichstein/tensorflow-open_nsfw/blob/master/image_utils.py#L4

@mdietrichstein
Copy link
Owner

Hey @zlin3000,

I've put a lot of time into investigating this issue to no avail...

@delta9 and you might be right in suspecting different resize implementations.

Another reason might be different jpeg decoding mechanisms (see here and here).

I haven't found the time to further investigate this, but I would love to solve this once and for all. I don't know when I'll get around to look into it again though.

Help is always appreciated :)

@hristorv
Copy link

hristorv commented Oct 11, 2017

@zlin3000 @delta9 @mdietrichstein Has anyone found a solution for the issue ?

@mdietrichstein
Copy link
Owner

@hristorv Not yet, I'm afraid

@mdietrichstein
Copy link
Owner

I have fixed a bug in the model definition (e1ada8d) which definitely corrupted some classifications.

It would be awesome if some of you could run your checks again and let me know if there are still major differences between the implementations.

Thanks!

@delta9
Copy link

delta9 commented Nov 18, 2017

Hey @mdietrichstein

I just did some quick random tests and still found major differences:

50 KB Image

  • Tensorflow: 0.21753324568271637
  • Caffe: 0.570083081722

449 KB Image

  • Tensorflow: 0.9021902084350586
  • Caffe: 0.981046199799

1.7 MB Image

  • Tensorflow: 0.6308742761611938
  • Caffe: 0.943047463894

These are the NSFW scores for some images from .. reddit

First I thought it had something to do with the image size but sadly it's all over the place.

Thank you so much for your work though!

@mdietrichstein
Copy link
Owner

mdietrichstein commented Nov 20, 2017

Hey @delta9

Thanks for your help!

First I thought it had something to do with the image size but sadly it's all over the place.

I've found out that tensorflow and caffe (original implementation) use different approaches in regards to padding when doing convolutions, pooling, etc.

I've made some adaptions to the model and it looks like it delivers better results now when using the yahoo image loader (-l yahoo). It's still not perfect, but at least I know what the problem is.

@mdietrichstein
Copy link
Owner

I've spent some more time on this and have identified two serious problems:

Padding issues when doing pooling/convolutions
This was due to the aforementioned different padding approaches between caffe and tensorflow. I've managed to fix that. The current version of this project now delivers essentially the same results as the original implementation when using the yahoo image loader.

Replicating the original image loading and preprocessing procedure is hard
Yahoo did some weird things in their preprocessing code, like:

  1. Decoding the input file using PIL
  2. Resizing it using PIL
  3. Encoding the resized image with PIL as JPEG in memory...
  4. Decoding the JPEG again, but this time with skimage?!

On top of that their model is very sensitive to changes in e.g. the JPEG codec, quality level, ....

I don't think it's possible to perfectly replicate the whole process with plain tensorflow due to different jpeg encoding/decoding and resize implementations/configurations between PIL, skimage and tensorflow.

That being said, I was still able to adapt the tensorflow loading code in a way that makes the difference a lot smaller than before (at least for my tests). The biggest difference I've observed was about 0.02.

@delta9 @zlin3000 It would be awesome if you could check out the new version and test if the results have improved for you too.

@delta9
Copy link

delta9 commented Nov 22, 2017

Thanks for the detailed explanation!

My use case would have been to use the converted model with Tensorflow Serving in conjunction with a mobile app backend to check user generated content on upload.

I was hoping for higher performance over the original caffe script since invoking the python script using a wrapper each time has a lot of overhead.

The current version of this project now delivers essentially the same results as the original implementation when using the yahoo image loader.

So I would need to preprocess the images using the yahoo image loader and then send over the data for prediction - if I want to use it with Tensorflow Serving?

@mdietrichstein
Copy link
Owner

So I would need to preprocess the images using the yahoo image loader and then send over the data for prediction - if I want to use it with Tensorflow Serving?

That's correct.

You could also try to use the improved tensorflow image loader and check if the results are good enough for your use case.

I'm currently trying to get access to a nsfw dataset to evaluate both image loader implementations and get some real numbers on the differences between them.

@waheebyaqub
Copy link

waheebyaqub commented Dec 21, 2017

using three datasets listed:

  1. 6785 Non-porn easy images
  2. 3555 Non-porn difficult images
  3. 4373 Porn images

yahoo image loader gave the following results:
all values are averaged

  1. non-porn easy images:
    SFW : 0.91932448
    NSFW : 0.08067552

  2. non-porn difficult images:
    SFW : 0.70118753
    NSFW : 0.29881247

  3. porn images:
    SFW : 0.19570181
    NSFW : 0.80429819

Original caffe yahoo NSFW gave the following results:
all values are averaged

  1. non-porn easy images:
    SFW : 0.91932540
    NSFW : 0.08067460

  2. non-porn difficult images:
    SFW : 0.70118952
    NSFW : 0.29881049

  3. porn images:
    SFW : 0.19570224
    NSFW : 0.80429776

Overall 4.3*10^-6 difference is not significant, between yahoo image loader implemented in tensorflow vs original caffe yahoo nsfw model based on the dataset

tensorflow image loader results:

3)porn images:
SFW: 0.20717194
NSFW: 0.79282895

@mdietrichstein
Copy link
Owner

mdietrichstein commented Jan 2, 2018

Hey @waheebyaqub!

Thank you so much for posting you results here. May I ask which dataset you were using for your test?

I'm planning to use this dataset for a detailed comparison in the future.

@waheebyaqub
Copy link

@mdietrichstein, I actually used the same data, that you have linked, with some preprocessing on porn frames.

@liudanking
Copy link

So has this problem been solved?

@mdietrichstein
Copy link
Owner

@waheebyaqub Alright, thanks!

So has this problem been solved?

@liudanking If you use the yahoo image loader then yes, the issue is fixed.
The tensorflow image loader still gives different results because of the reasons mentioned in this issue though.

@liudanking
Copy link

@mdietrichstein Partially solved is still cool!
BTW, is there any plan to give a guide for fine-tuning tensorflow-open_nsfw just like the yahoo one?

@mdietrichstein
Copy link
Owner

BTW, is there any plan to give a guide for fine-tuning tensorflow-open_nsfw just like the yahoo one?

Not in the near future since I'm spending most of my time on a different project at the moment. I'd like to look into it once I have a bit more time though.

@tower506
Copy link

tower506 commented Jan 8, 2018

Hi guys,@mdietrichstein @waheebyaqub , is there any chance you guys can provide an alternative link to download the dataset? The one for google sites no longer seems to be working (dataset download, site is up).
I'm trying to compare how much diferrence is there between a pretrained inception model, vs this implementation / the original open nsfw.
Thanks,

@loretoparisi
Copy link

loretoparisi commented Mar 12, 2018

@mdietrichstein @waheebyaqub thanks for your suggestions. Could you please share the final results of this new tuning and which is the improved of the original Yahoo! weights of open_nsfw?

Also the scores range posted above, are specific to this test, or we can consider valid in general?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants