Skip to content

Cannot read and immediately write back out Word file with invalid XML entities in content #720

Open
@tjarrett

Description

@tjarrett

If you load a Word 2007 file that has invalid XML characters in the content (for example xml-characters-in.docx) and then immediately try to save that file back out doing something like this:

$test = \PhpOffice\PhpWord\IOFactory::load('xml-characters-in.docx');
$test->save('xml-characters-out.docx');

You get a corrupted Word file back (see xml-characters-out.docx).

This is probably related to #671, #401, #514, and other similar issues. However, I don't have a chance to scrub the content because I am reading and then immediately trying to save the result.

Maybe the Reader should scrub the incoming $textContent?


Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions