Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MS Word files created with mz_zip_add_mem_to_archive_file_in_place() fail to open #279

Open
nyq opened this issue Aug 13, 2023 · 0 comments

Comments

@nyq
Copy link
Contributor

nyq commented Aug 13, 2023

Note: The following issue occurs only for MS Word .docx files that contain an alien empty directory.

mz_zip_add_mem_to_archive_file_in_place() seems to generate zip files that MS Word 2007 detects as "zip archive of unsupported version". Here are general steps to reproduce:

  1. Create an empty file in MS Word 2007 and save it as .docx (which internally is a ZIP file)
  2. Add .zip extension to the file and open it in any zip archive manager (7Zip, FAR)
  3. Add an alien empty directory to the archive (a folder having any name that MS Word does not expect in .docx file, e.g. "test")
  4. Remove .zip extension from the file name
  5. Open the file in Ms Word 2007 - file opens without an error (Word "forgives" the empty alien directory and does not complain)
  6. Write a simple code using miniz library that loads that zip file, iterates through each file in the archive by unzipping that file into memory and then immediately compressing each file into a new archive, basically cloning the source archive (see code snippet at the end of the message)
  7. Unzip both source archive and resulting clone archive using any ZIP manager (e.g. 7Zip) into two separate directories and perform file comparison - all files will be binary identical, as expected.
  8. Try to open clone archive in MS Word 2007 and observe the following error message: "The file cannot be opened because there are problems with the contents. Details: Microsoft Office cannot open this file because the .zip archive file is unsupported version". If you press OK and then press Yes in recovery message, the file will actually load OK.

So basically the data contents of the archive cloned with mz_zip_add_mem_to_archive_file_in_place() is valid (because both archives contain binary identical files when uncompressed and because Word is able to load the file after some recovery), but there is something in the archive metadata that triggers the error message. Notice that when the cloned archive is created with, let's say 7Zip utility, Word does not complain, so it is something about mz_zip_add_mem_to_archive_file_in_place()

Note: the problem happens regardless of the level of compression specified as a parameter when calling mz_zip_add_mem_to_archive_file_in_place()

Important note: I do understand that we are really feeding non-standard .docx content into MS Word by creating an alien directory, but it is very interesting that when such alien directory is created by 7Zip, MS Word does not complain, but when it is created by miniz, it does complain. So it is some kind of idiosyncrasy of MS Word, but clearly it is caused by some small difference between how miniz creates zip files and how 7Zip does it, so I think it would be interesting to find out the root cause.

Note: this effect can only be observed on an empty alien directory. When either alien file or non-empty alien directory are injected into .docx, MS Word complains regardless of what code has created the zip files (7Zip/FAR/miniz)

bool docx_clone(std::string file_src, std::string file_dst)
{
	//Load source archive
	mz_zip_archive zip_archive_src;
	memset(&zip_archive_src, 0, sizeof(zip_archive_src));

	if (!mz_zip_reader_init_file(&zip_archive_src, file_src.c_str(), 0))
	{
		return false;
	}

	// Iterate through all files and directories in the archive
	for (int i = 0; i < (int)mz_zip_reader_get_num_files(&zip_archive_src); i++)
	{
		//Get file/directory information
		mz_zip_archive_file_stat file_stat;
		if (!mz_zip_reader_file_stat(&zip_archive_src, i, &file_stat))
		{
			mz_zip_reader_end(&zip_archive_src);
			return false;
		}
		std::cout << file_stat.m_filename << std::endl;

		//If it is a file, decompress its contents into memory
		size_t uncomp_size		= 0;
		void * file_in_memory	= NULL;
		if (!mz_zip_reader_is_file_a_directory(&zip_archive_src, i))
		{
			file_in_memory = mz_zip_reader_extract_file_to_heap(&zip_archive_src, file_stat.m_filename, &uncomp_size, 0);
			if (file_in_memory==NULL)
			{
				mz_zip_reader_end(&zip_archive_src);
				return false;
			}			
		}
		//Save the file/directory into destination archive
		//Notice that in case of directory, file_in_memory pointer will be NULL, which is exactly what miniz wants for directories
		if
		(
			!mz_zip_add_mem_to_archive_file_in_place
			(
				file_dst.c_str(),
				file_stat.m_filename,
				file_in_memory,
				uncomp_size,
				file_stat.m_comment,
				file_stat.m_comment_size,
				MZ_DEFAULT_COMPRESSION
			)
		)
		{
			mz_free(file_in_memory);
			mz_zip_reader_end(&zip_archive_src);
			return false;
		}
		//Clean up memory allocation, if any
		if (file_in_memory!=NULL)
			mz_free(file_in_memory);
	}

	// Close the archive, freeing any resources it was using
	mz_zip_reader_end(&zip_archive_src);
	return true;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant