Skip to content

Open Word docx file with "The image part with relationship rID8 was not found" error, it always fails #1105

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
paradise321 opened this issue Jun 23, 2022 · 7 comments

Comments

@paradise321
Copy link

When I try to open a docx file with "The image part with relationship rID8 was not found" error, the following error is reported always.

The code is as simple as following:


from docx import Document

doc = Document('.\WordHasSomeProblem.docx')

WordHasSomeProblem.docx

The problematic word file is attached, i.e., WordHasSomeProblem.docx

As Microsoft has declared here: https://docs.microsoft.com/en-us/office/troubleshoot/word/image-part-relationship-rld8-not-found-error-microsoft-word, the problem happens here is because of some target field was set to "NULL.".

My question is: is that possible to add some parameter in Document() to just ignore this kind of errors and continue loading the docx into RAM?

Thanks!

@paradise321
Copy link
Author

error traces like below:

%Run read_word.py
Traceback (most recent call last):
File "C:\Temp\Source\Python\Word1\read_word.py", line 3, in
doc = Document('.\WordHasSomeProblem.docx')
File "C:\Users\userid\AppData\Local\Programs\Python\Python310\lib\site-packages\docx\api.py", line 25, in Document
document_part = Package.open(docx).main_document_part
File "C:\Users\userid\AppData\Local\Programs\Python\Python310\lib\site-packages\docx\opc\package.py", line 128, in open
pkg_reader = PackageReader.from_file(pkg_file)
File "C:\Users\userid\AppData\Local\Programs\Python\Python310\lib\site-packages\docx\opc\pkgreader.py", line 35, in from_file
sparts = PackageReader._load_serialized_parts(
File "C:\Users\userid\AppData\Local\Programs\Python\Python310\lib\site-packages\docx\opc\pkgreader.py", line 69, in _load_serialized_parts
for partname, blob, reltype, srels in part_walker:
File "C:\Users\userid\AppData\Local\Programs\Python\Python310\lib\site-packages\docx\opc\pkgreader.py", line 110, in _walk_phys_parts
for partname, blob, reltype, srels in next_walker:
File "C:\Users\userid\AppData\Local\Programs\Python\Python310\lib\site-packages\docx\opc\pkgreader.py", line 105, in _walk_phys_parts
blob = phys_reader.blob_for(partname)
File "C:\Users\userid\AppData\Local\Programs\Python\Python310\lib\site-packages\docx\opc\phys_pkg.py", line 108, in blob_for
return self._zipf.read(pack_uri.membername)
File "C:\Users\userid\AppData\Local\Programs\Python\Python310\lib\zipfile.py", line 1464, in read
with self.open(name, "r", pwd) as fp:
File "C:\Users\userid\AppData\Local\Programs\Python\Python310\lib\zipfile.py", line 1503, in open
zinfo = self.getinfo(name)
File "C:\Users\userid\AppData\Local\Programs\Python\Python310\lib\zipfile.py", line 1430, in getinfo
raise KeyError(
KeyError: "There is no item named 'word/NULL' in the archive"

@paradise321 paradise321 changed the title Open Word docx file with "The image part with relationship rID8 was not found" error, it always failed Open Word docx file with "The image part with relationship rID8 was not found" error, it always fails Jun 23, 2022
@wonzer
Copy link

wonzer commented Nov 1, 2022

@paradise321 You can override load_from_xml of python-docx to skip the element which target_ref is null. Put this code before "doc = Document('.\WordHasSomeProblem.docx')".

from docx.opc.pkgreader import _SerializedRelationships, _SerializedRelationship
from docx.opc.oxml import parse_xml


def load_from_xml_v2(baseURI, rels_item_xml):
    """
    Return |_SerializedRelationships| instance loaded with the
    relationships contained in *rels_item_xml*. Returns an empty
    collection if *rels_item_xml* is |None|.
    """
    srels = _SerializedRelationships()
    if rels_item_xml is not None:
        rels_elm = parse_xml(rels_item_xml)
        for rel_elm in rels_elm.Relationship_lst:
            if rel_elm.target_ref in ('../NULL', 'NULL'):
                continue
            srels._srels.append(_SerializedRelationship(baseURI, rel_elm))
    return srels


_SerializedRelationships.load_from_xml = load_from_xml_v2

@azurewtl
Copy link

azurewtl commented May 9, 2023

@paradise321 You can override load_from_xml of python-docx to skip the element which target_ref is null. Put this code before "doc = Document('.\WordHasSomeProblem.docx')".

from docx.opc.pkgreader import _SerializedRelationships, _SerializedRelationship
from docx.opc.oxml import parse_xml


def load_from_xml_v2(baseURI, rels_item_xml):
    """
    Return |_SerializedRelationships| instance loaded with the
    relationships contained in *rels_item_xml*. Returns an empty
    collection if *rels_item_xml* is |None|.
    """
    srels = _SerializedRelationships()
    if rels_item_xml is not None:
        rels_elm = parse_xml(rels_item_xml)
        for rel_elm in rels_elm.Relationship_lst:
            if rel_elm.target_ref in ('../NULL', 'NULL'):
                continue
            srels._srels.append(_SerializedRelationship(baseURI, rel_elm))
    return srels


_SerializedRelationships.load_from_xml = load_from_xml_v2

This is a GENIUS fix!
I save it as docx_patch.py, just import docx_patch after docx importation, it works.

@niuhuluzhihao
Copy link

@paradise321 You can override load_from_xml of python-docx to skip the element which target_ref is null. Put this code before "doc = Document('.\WordHasSomeProblem.docx')".

from docx.opc.pkgreader import _SerializedRelationships, _SerializedRelationship
from docx.opc.oxml import parse_xml


def load_from_xml_v2(baseURI, rels_item_xml):
    """
    Return |_SerializedRelationships| instance loaded with the
    relationships contained in *rels_item_xml*. Returns an empty
    collection if *rels_item_xml* is |None|.
    """
    srels = _SerializedRelationships()
    if rels_item_xml is not None:
        rels_elm = parse_xml(rels_item_xml)
        for rel_elm in rels_elm.Relationship_lst:
            if rel_elm.target_ref in ('../NULL', 'NULL'):
                continue
            srels._srels.append(_SerializedRelationship(baseURI, rel_elm))
    return srels


_SerializedRelationships.load_from_xml = load_from_xml_v2

This is a GENIUS fix! I save it as docx_patch.py, just import docx_patch after docx importation, it works.

It did solve my problem

@HongxinLiu17
Copy link

@paradise321 You can override load_from_xml of python-docx to skip the element which target_ref is null. Put this code before "doc = Document('.\WordHasSomeProblem.docx')".

from docx.opc.pkgreader import _SerializedRelationships, _SerializedRelationship
from docx.opc.oxml import parse_xml


def load_from_xml_v2(baseURI, rels_item_xml):
    """
    Return |_SerializedRelationships| instance loaded with the
    relationships contained in *rels_item_xml*. Returns an empty
    collection if *rels_item_xml* is |None|.
    """
    srels = _SerializedRelationships()
    if rels_item_xml is not None:
        rels_elm = parse_xml(rels_item_xml)
        for rel_elm in rels_elm.Relationship_lst:
            if rel_elm.target_ref in ('../NULL', 'NULL'):
                continue
            srels._srels.append(_SerializedRelationship(baseURI, rel_elm))
    return srels


_SerializedRelationships.load_from_xml = load_from_xml_v2

Cool!

@Cesar8k
Copy link

Cesar8k commented Jun 9, 2024

Hello,

I am having a similar problem, but the error is this:

KeyError: "There is no item named 'word/#pdfImages' in the archive"

Please, could you advise me how to change the suggested solution code to fix my issue?

Thanks!

@Cesar8k
Copy link

Cesar8k commented Jun 9, 2024

Hello,

I am having a similar problem, but the error is this:

KeyError: "There is no item named 'word/#pdfImages' in the archive"

Please, could you advise me how to change the suggested solution code to fix my issue?

Thanks!

I modified the line:
if rel_elm.target_ref in ('../NULL', 'NULL', '#pdfImages')

and now it works!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants