KeyError: "There is no item named 'word/NULL' in the archive" #797

KristenMoore · 2020-03-17T03:50:11Z

I'm getting this error when opening an unremarkable looking Word file from a corpus of over 1K Word files which haven't had a problem. Opening it Word and saving it made no difference.

<ipython-input-122-9b21415cda29> in <module>
----> 1 doc = docx.Document(f"data/{filename}")

/usr/local/lib/python3.7/site-packages/docx/api.py in Document(docx)
     23     """
     24     docx = _default_docx_path() if docx is None else docx
---> 25     document_part = Package.open(docx).main_document_part
     26     if document_part.content_type != CT.WML_DOCUMENT_MAIN:
     27         tmpl = "file '%s' is not a Word file, content type is '%s'"

/usr/local/lib/python3.7/site-packages/docx/opc/package.py in open(cls, pkg_file)
    126         *pkg_file*.
    127         """
--> 128         pkg_reader = PackageReader.from_file(pkg_file)
    129         package = cls()
    130         Unmarshaller.unmarshal(pkg_reader, package, PartFactory)

/usr/local/lib/python3.7/site-packages/docx/opc/pkgreader.py in from_file(pkg_file)
     34         pkg_srels = PackageReader._srels_for(phys_reader, PACKAGE_URI)
     35         sparts = PackageReader._load_serialized_parts(
---> 36             phys_reader, pkg_srels, content_types
     37         )
     38         phys_reader.close()

/usr/local/lib/python3.7/site-packages/docx/opc/pkgreader.py in _load_serialized_parts(phys_reader, pkg_srels, content_types)
     67         sparts = []
     68         part_walker = PackageReader._walk_phys_parts(phys_reader, pkg_srels)
---> 69         for partname, blob, reltype, srels in part_walker:
     70             content_type = content_types[partname]
     71             spart = _SerializedPart(

/usr/local/lib/python3.7/site-packages/docx/opc/pkgreader.py in _walk_phys_parts(phys_reader, srels, visited_partnames)
    108                 phys_reader, part_srels, visited_partnames
    109             )
--> 110             for partname, blob, reltype, srels in next_walker:
    111                 yield (partname, blob, reltype, srels)
    112 

/usr/local/lib/python3.7/site-packages/docx/opc/pkgreader.py in _walk_phys_parts(phys_reader, srels, visited_partnames)
    108                 phys_reader, part_srels, visited_partnames
    109             )
--> 110             for partname, blob, reltype, srels in next_walker:
    111                 yield (partname, blob, reltype, srels)
    112 

/usr/local/lib/python3.7/site-packages/docx/opc/pkgreader.py in _walk_phys_parts(phys_reader, srels, visited_partnames)
    103             reltype = srel.reltype
    104             part_srels = PackageReader._srels_for(phys_reader, partname)
--> 105             blob = phys_reader.blob_for(partname)
    106             yield (partname, blob, reltype, part_srels)
    107             next_walker = PackageReader._walk_phys_parts(

/usr/local/lib/python3.7/site-packages/docx/opc/phys_pkg.py in blob_for(self, pack_uri)
    106         matching member is present in zip archive.
    107         """
--> 108         return self._zipf.read(pack_uri.membername)
    109 
    110     def close(self):

/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/zipfile.py in read(self, name, pwd)
   1404     def read(self, name, pwd=None):
   1405         """Return file bytes (as a string) for name."""
-> 1406         with self.open(name, "r", pwd) as fp:
   1407             return fp.read()
   1408 

/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/zipfile.py in open(self, name, mode, pwd, force_zip64)
   1443         else:
   1444             # Get info object for name
-> 1445             zinfo = self.getinfo(name)
   1446 
   1447         if mode == 'w':

/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/zipfile.py in getinfo(self, name)
   1371         if info is None:
   1372             raise KeyError(
-> 1373                 'There is no item named %r in the archive' % name)
   1374 
   1375         return info

KeyError: "There is no item named 'word/NULL' in the archive"```

The text was updated successfully, but these errors were encountered:

scanny · 2020-03-17T04:18:02Z

Hi Kristen. This is a problem we see occasionally. Our best guess is that there is a Word plugin like maybe Small Business Productivity Pak or something like that which is not too careful about cleaning up after itself when it deletes things.

Unfortunately there's no easy fix, but depending on your skill level and determination you can fix it. You can find some other issues related to it be searching Google with NULL relationship "python-docx" and some others by substituting "python-pptx" for "python-docx".

The way I would fix it on a single file would be to extract the package using opc-diag, grep through the relationship files to find NULL with something like grep NULL *.rels, and then just delete the offending relationship line.

There might be one or two more accessible ways if that sounds like Greek.

Let us know how you go.

KristenMoore · 2020-03-20T04:21:10Z

Thanks for the quick reply. This all makes sense, I looked up some other issues too like you said, but I can't find NULL in any relationship files (or anywhere else for that matter) in this doc.

scanny · 2020-03-20T16:32:14Z

Can you share the doc? You can send it to me by email if you want. Otherwise, I don't see how it could not be in there given where the exception is happening. "NULL" is not a string that would normally occur in Python so would indicate something like Java or C# as the source.

Have you unzipped the .docx file and grep-ed it for "NULL"?

KristenMoore · 2020-03-22T22:11:21Z

Apologies - it worked. Don't know what I did wrong the first time.
Many thanks.

scanny · 2020-03-23T18:24:46Z

No worries, glad you got it working Kristen :)

aorsten · 2020-12-03T15:22:35Z

I get a similar error message: KeyError: "There is no item named 'word/#MyBookmark' in the archive"

This is achieved by:

Adding a picture to the Word file
Right-click -> Link
Add link to an internal bookmark.

Then the hyperlink ends up like this, notice the a:hlinkClick relationship ID:

                <w:drawing>
                    <wp:inline distT="0" distB="0" distL="0" distR="0" wp14:anchorId="2A30E332" wp14:editId="3F2B5F70">
                       (...)
                        <a:graphic xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main">
                            <a:graphicData uri="http://schemas.openxmlformats.org/drawingml/2006/picture">
                                <pic:pic xmlns:pic="http://schemas.openxmlformats.org/drawingml/2006/picture">
                                    <pic:nvPicPr>
                                        <pic:cNvPr id="28" name="Picture 28" descr="(...)">
                                            <a:hlinkClick r:id="rId21"/>
                                        </pic:cNvPr>
                                        <pic:cNvPicPr/>
                                    </pic:nvPicPr>
                                    <pic:blipFill>
                                        (...)
                                    </pic:blipFill>
                                    (...)
                                </pic:pic>
                            </a:graphicData>
                        </a:graphic>
                    </wp:inline>
                </w:drawing>

Now, in word/_rels/document.xml.rels, we get:

    <Relationship Id="rId21" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/hyperlink" Target="#MyBookmark"/>

This item bugs python-docx for me. I'll admit I'm using a 2.5-year-old version of the package, since I needed to modify stuff for my own usecase, so I am not sure whether this has been fixed after that. I was looking for whether this had been solved somehow, and it seems it is very much related to this issue.

@scanny Do you reckon this is easily solved - and do you have any suggestions to how? I see in the pkgreader that the target_mode can be used to identify external targets, and that external targets receive special treatment to avoid such zipfile issues. From what I gather, RT.HYPERLINK elements that have a Target starting with # should be treated specially - like some sort of internal bookmark relationship (or similar).

scanny · 2020-12-03T18:21:15Z

@aorsten probably a separate ticket is best. You can refer to this one from there if you think this is related enough, but this seems like possibly an ambiguity in the spec rather than a (maybe) violation of it like the NULL relationship one is.

Make sure to include the stack trace in the report.

The error seems to be coming from attempting to load the part, so wherever the code is deciding which relationships are loadable sounds like the right neighborhood. Probably in opc/package.py or thereabouts.

wonzer · 2022-11-01T06:14:20Z

You can find a solution here->
#1105 (comment)

KristenMoore closed this as completed Mar 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

KeyError: "There is no item named 'word/NULL' in the archive" #797

KeyError: "There is no item named 'word/NULL' in the archive" #797

KristenMoore commented Mar 17, 2020

scanny commented Mar 17, 2020 •

edited

Loading

Uh oh!

KristenMoore commented Mar 20, 2020 •

edited

Loading

Uh oh!

scanny commented Mar 20, 2020

Uh oh!

KristenMoore commented Mar 22, 2020

Uh oh!

scanny commented Mar 23, 2020

Uh oh!

aorsten commented Dec 3, 2020 •

edited

Loading

Uh oh!

scanny commented Dec 3, 2020

Uh oh!

wonzer commented Nov 1, 2022 •

edited

Loading

Uh oh!

KeyError: "There is no item named 'word/NULL' in the archive" #797

KeyError: "There is no item named 'word/NULL' in the archive" #797

Comments

KristenMoore commented Mar 17, 2020

scanny commented Mar 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

KristenMoore commented Mar 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

scanny commented Mar 20, 2020

Uh oh!

KristenMoore commented Mar 22, 2020

Uh oh!

scanny commented Mar 23, 2020

Uh oh!

aorsten commented Dec 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

scanny commented Dec 3, 2020

Uh oh!

wonzer commented Nov 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

scanny commented Mar 17, 2020 •

edited

Loading

KristenMoore commented Mar 20, 2020 •

edited

Loading

aorsten commented Dec 3, 2020 •

edited

Loading

wonzer commented Nov 1, 2022 •

edited

Loading