Skip to content

Commit

Permalink
Fix bug when missing workbook relationship
Browse files Browse the repository at this point in the history
Add a check to see if the list is empty before trying to access it's
contents. If an excel file has an overridden relationship with no word
"book" in the name it will attempt to grab the first item of an empty
list when looking up workbook relationships.

    IndexError: list index out of range

There could be a better fix to this issue I'm not well enough versed in
the xslx specification. The following xlsx file caused the issue.

    ```bash
      $ unzip -l some_file.xlsx
      Archive:  some_file.xlsx
        Length      Date    Time    Name
      ---------  ---------- -----   ----
            142  02-06-2024 13:28   xl/worksheets/_rels/sheet1.xml.rels
       65968555  02-06-2024 13:28   xl/worksheets/sheet1.xml
        2078037  02-06-2024 13:28   xl/sharedStrings.xml
           9867  02-06-2024 13:28   xl/styles.xml
            566  02-06-2024 13:28   xl/_rels/workbook.xml.rels
            388  02-06-2024 13:28   xl/workbook.xml
            297  02-06-2024 13:28   _rels/.rels
           1122  02-06-2024 13:28   [Content_Types].xml
      ---------                     -------
       68058974                     8 files
    ```

In `[Content_types].xml` it is overriding the relationships to point at
`_rels/.rels` rather than `xl/_rels/workbook.xml.rels`. This causes the
`workbook_relationships` list to be empty causes the error mentioned
above. One can see that it does indeed have a workbook relationship,
however it is being overridden.

    `[Contenet_types].xml`:
    ```xml
      <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
      <Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types">
        <Default Extension="png" ContentType="image/png"/>
        <Default Extension="jpeg" ContentType="image/jpeg"/>
        <Default Extension="rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/>
        <Default Extension="xml" ContentType="application/xml"/>
        <Default Extension="vml" ContentType="application/vnd.openxmlformats-officedocument.vmlDrawing"/>
        <Override PartName="/xl/worksheets/sheet1.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.worksheet+xml"/>
        <Override PartName="/xl/sharedStrings.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.sharedStrings+xml"/>
        <Override PartName="/xl/styles.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.styles+xml"/>
        <Override PartName="/xl/workbook.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml"/>
        <Override PartName="/_rels/.rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/>
      </Types>
    ```

    `xl/_rels/workbook.xml.rels`:
    ```xml
      <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
      <Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
        <Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet" Target="worksheets/sheet1.xml"/>
        <Relationship Id="rId2" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/sharedStrings" Target="sharedStrings.xml"/>
        <Relationship Id="rId3" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles" Target="styles.xml"/>
      </Relationships>
    ```
  • Loading branch information
tmiller committed Mar 17, 2024
1 parent f2a429a commit 68c2942
Showing 1 changed file with 5 additions and 2 deletions.
7 changes: 5 additions & 2 deletions xlsx2csv.py
Original file line number Diff line number Diff line change
Expand Up @@ -222,8 +222,11 @@ def __init__(self, xlsxfile, **options):
self.shared_strings = self._parse(SharedStrings, self.content_types.types["shared_strings"])
self.styles = self._parse(Styles, self.content_types.types["styles"])
self.workbook = self._parse(Workbook, self.content_types.types["workbook"])
workbook_relationships = list(filter(lambda r: "book" in r, self.content_types.types["relationships"]))[0]
self.workbook.relationships = self._parse(Relationships, workbook_relationships)
workbook_relationships = list(filter(lambda r: "book" in r, self.content_types.types["relationships"]))
if len(workbook_relationships) > 0:
self.workbook.relationships = self._parse(Relationships, workbook_relationships[0])
else:
self.workbook.relationships = Relationships()
if self.options['no_line_breaks']:
self.shared_strings.replace_line_breaks()
elif self.options['escape_strings']:
Expand Down

0 comments on commit 68c2942

Please sign in to comment.