-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
As of yesterday, Pixiv began removing metadata from images. #807
Comments
This will affect the filesize, right? If you enable
unlikely. |
Yeah, that's the problem. The file size and file hash will be changed, so visually identical files will be downloaded with this setting, possibly every single file! The fault lies mostly on Pixiv, as they have not been modifying files for years, but suddenly decided to do this. |
IMPORTANT: Do not delete the old copies of your files. I'm currently doing additional testing to determine whether Pixiv used a truly lossless method of metadata removal by trying to recreate the process they used. Thanks for the heads up, it's appreciated. After looking into the situation a bit more here is what I discovered: Why this probably happened: What images does this affect? PNG are not affected since they don't hold EXIF data. Furthermore, not all JPG's are affected since some artists took extra steps to remove EXIF data before uploading to Pixiv. These images were still identical when I ran checksum verification on them. I need to test ugoira files more. Why are some files now larger or smaller? Has image quality been affected? What should you do? |
On which exact date did this change happen, I was thinking if we just started updating our stashes from x date forwards and nothing before, then it shouldn't matter too much, yes? As long as you keep an eye not to mix the new and old stuffs. Additionally, in the Also since this came up, If say i had I'm just wondering what will the danboorus do, they basically run on md5 to delete duplicates. Also curious how pixiv intends on doing this exif erasing retroactively, there's ~44 million posts on the site, who knows how many actual images.
Not gonna lie, i thought i was crazy for awhile during june-august but it appeared like the images during that period was somehow very jpg-ed than what i'd expect from certain artists. I couldn't prove anything since it was the same regardless how i saved them, manually or via pixivutil2. Was that was their trial run? |
changing the exif/metadata should change the checksum/md5. I think danbooru keep the old images, as sometimes some artist update the old post to make revision. |
Over the years there have been many artists who started out uploading images at much higher quality then for whatever personal reasons started heavily compressing images, lowering resolution or changing the file format they upload. So that's mostly likely a coincidence but you could always ask the artist. |
I found that some image links recently throw an error 'Error 500: failed to thumbnailing' and didn't return image data. (they are all corrupted images, and they are downloadable before) Example: https://www.pixiv.net/en/artworks/47412864 Is there any other way to get the original images? |
Note if you end up with duplicate images in content but with different in file size files due to the metadata, there is a deduplication tool for Windows called AllDup that can specifically search by file content excluding any JPEG metadata. Also, for a more accurate comparison of images that may have been reprocessed, I recommend using Irfanview which has a feature under Image Properties to list the exact number of unique colors in the image. If the content of a JPEG was changed in any way, it will show a different number. |
looks like they keep the works date (from "createDate" node). If you set |
add new config `checkLastModified` in `[DownloadControl]` section to compare last-modified time with works date, require setlastmodified = True in config.ini to work properly.
This isn't so much of an issue that anyone can fix (I think), but anyone using the database functionality of PixivUtil2 to check for duplicates and image edits should be aware.
It is very likely that new files and old files will look the same and you may re-download the same image, but with metadata removed.
I don't know if anything can be done in PixivUtil2 without slowing down the checking process drastically. Determining the hash of an image with metadata removed would basically require doing the exact same removal as Pixiv.
I don't know if the original, unedited images can be retrieved either.
The text was updated successfully, but these errors were encountered: