You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Note: The following issue occurs only for MS Word .docx files that contain an alien empty directory.
mz_zip_add_mem_to_archive_file_in_place() seems to generate zip files that MS Word 2007 detects as "zip archive of unsupported version". Here are general steps to reproduce:
Create an empty file in MS Word 2007 and save it as .docx (which internally is a ZIP file)
Add .zip extension to the file and open it in any zip archive manager (7Zip, FAR)
Add an alien empty directory to the archive (a folder having any name that MS Word does not expect in .docx file, e.g. "test")
Remove .zip extension from the file name
Open the file in Ms Word 2007 - file opens without an error (Word "forgives" the empty alien directory and does not complain)
Write a simple code using miniz library that loads that zip file, iterates through each file in the archive by unzipping that file into memory and then immediately compressing each file into a new archive, basically cloning the source archive (see code snippet at the end of the message)
Unzip both source archive and resulting clone archive using any ZIP manager (e.g. 7Zip) into two separate directories and perform file comparison - all files will be binary identical, as expected.
Try to open clone archive in MS Word 2007 and observe the following error message: "The file cannot be opened because there are problems with the contents. Details: Microsoft Office cannot open this file because the .zip archive file is unsupported version". If you press OK and then press Yes in recovery message, the file will actually load OK.
So basically the data contents of the archive cloned with mz_zip_add_mem_to_archive_file_in_place() is valid (because both archives contain binary identical files when uncompressed and because Word is able to load the file after some recovery), but there is something in the archive metadata that triggers the error message. Notice that when the cloned archive is created with, let's say 7Zip utility, Word does not complain, so it is something about mz_zip_add_mem_to_archive_file_in_place()
Note: the problem happens regardless of the level of compression specified as a parameter when calling mz_zip_add_mem_to_archive_file_in_place()
Important note: I do understand that we are really feeding non-standard .docx content into MS Word by creating an alien directory, but it is very interesting that when such alien directory is created by 7Zip, MS Word does not complain, but when it is created by miniz, it does complain. So it is some kind of idiosyncrasy of MS Word, but clearly it is caused by some small difference between how miniz creates zip files and how 7Zip does it, so I think it would be interesting to find out the root cause.
Note: this effect can only be observed on an empty alien directory. When either alien file or non-empty alien directory are injected into .docx, MS Word complains regardless of what code has created the zip files (7Zip/FAR/miniz)
bool docx_clone(std::string file_src, std::string file_dst)
{
//Load source archive
mz_zip_archive zip_archive_src;
memset(&zip_archive_src, 0, sizeof(zip_archive_src));
if (!mz_zip_reader_init_file(&zip_archive_src, file_src.c_str(), 0))
{
return false;
}
// Iterate through all files and directories in the archive
for (int i = 0; i < (int)mz_zip_reader_get_num_files(&zip_archive_src); i++)
{
//Get file/directory information
mz_zip_archive_file_stat file_stat;
if (!mz_zip_reader_file_stat(&zip_archive_src, i, &file_stat))
{
mz_zip_reader_end(&zip_archive_src);
return false;
}
std::cout << file_stat.m_filename << std::endl;
//If it is a file, decompress its contents into memory
size_t uncomp_size = 0;
void * file_in_memory = NULL;
if (!mz_zip_reader_is_file_a_directory(&zip_archive_src, i))
{
file_in_memory = mz_zip_reader_extract_file_to_heap(&zip_archive_src, file_stat.m_filename, &uncomp_size, 0);
if (file_in_memory==NULL)
{
mz_zip_reader_end(&zip_archive_src);
return false;
}
}
//Save the file/directory into destination archive
//Notice that in case of directory, file_in_memory pointer will be NULL, which is exactly what miniz wants for directories
if
(
!mz_zip_add_mem_to_archive_file_in_place
(
file_dst.c_str(),
file_stat.m_filename,
file_in_memory,
uncomp_size,
file_stat.m_comment,
file_stat.m_comment_size,
MZ_DEFAULT_COMPRESSION
)
)
{
mz_free(file_in_memory);
mz_zip_reader_end(&zip_archive_src);
return false;
}
//Clean up memory allocation, if any
if (file_in_memory!=NULL)
mz_free(file_in_memory);
}
// Close the archive, freeing any resources it was using
mz_zip_reader_end(&zip_archive_src);
return true;
}
The text was updated successfully, but these errors were encountered:
Note: The following issue occurs only for MS Word .docx files that contain an alien empty directory.
mz_zip_add_mem_to_archive_file_in_place() seems to generate zip files that MS Word 2007 detects as "zip archive of unsupported version". Here are general steps to reproduce:
So basically the data contents of the archive cloned with mz_zip_add_mem_to_archive_file_in_place() is valid (because both archives contain binary identical files when uncompressed and because Word is able to load the file after some recovery), but there is something in the archive metadata that triggers the error message. Notice that when the cloned archive is created with, let's say 7Zip utility, Word does not complain, so it is something about mz_zip_add_mem_to_archive_file_in_place()
Note: the problem happens regardless of the level of compression specified as a parameter when calling mz_zip_add_mem_to_archive_file_in_place()
Important note: I do understand that we are really feeding non-standard .docx content into MS Word by creating an alien directory, but it is very interesting that when such alien directory is created by 7Zip, MS Word does not complain, but when it is created by miniz, it does complain. So it is some kind of idiosyncrasy of MS Word, but clearly it is caused by some small difference between how miniz creates zip files and how 7Zip does it, so I think it would be interesting to find out the root cause.
Note: this effect can only be observed on an empty alien directory. When either alien file or non-empty alien directory are injected into .docx, MS Word complains regardless of what code has created the zip files (7Zip/FAR/miniz)
The text was updated successfully, but these errors were encountered: