-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Saving Duplicate Images #359
Comments
Thanks for the insight @mlh758. That's a useful feature. I'll certainly accept that patch if somebody did that. we can store the hash of the image in the |
A hash may actually be overkill, the bytes package has has an Equal function that already handles equality and would probably be more efficient. |
This was referenced Mar 22, 2019
nullfy
pushed a commit
to nullfy/excelize
that referenced
this issue
Oct 23, 2020
Adding the same image should create a drawing referencing the already stored copy of the image. Closes qax-os#359
nullfy
pushed a commit
to nullfy/excelize
that referenced
this issue
Oct 23, 2020
Do not save duplicate images
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Description
When adding an image, it would be useful to detect if the image is a duplicate and instead of adding the image again, return a reference to the existing media.
We have a header that gets added in some large workbooks to each sheet, and this substantially increases the file size. If we open the file in Excel and save it again, Excel detects the duplicate images and consolidates them on save, reducing the file size.
I think it would be possible to change
addMedia
inpicture.go
to handle this behavior.AddPictureFromBytes
would change slightly as well:addMedia
steps through all the saved media and looks for byte slices that have the same length as the file we are trying to save. Compute the hash for both (we can obviously re-use the hash of the file being saved for future comparisons)AddPictureFromBytes
callsaddMedia
before callingaddDrawingRelationships
and uses the media path provided byaddMedia
in the call toaddDrawingRelationships
.Since checking the length of a slice is a constant time operation, and very few slices should have the exact same length without actually being the same media there shouldn't be much of a performance impact from the hashing. However, it would be worth adding a benchmark to ensure this doesn't cause a regression for #274. It would also be useful to incorporate a benchmark for actually saving the xlsx file since it's likely that the performance impact of this check would be offset by not having to write as many files.
I will wait for feedback on this one since while I think it may be useful but I understand not wanting to risk the performance impact.
The text was updated successfully, but these errors were encountered: