Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.sheet() method returns wrong sheet #166

Closed
tddrmllr opened this issue Jan 12, 2015 · 8 comments · Fixed by #196
Closed

.sheet() method returns wrong sheet #166

tddrmllr opened this issue Jan 12, 2015 · 8 comments · Fixed by #196

Comments

@tddrmllr
Copy link

I have a spreadsheet with 7 sheets. But here's what happens:

spreadsheet = Roo::Excelx.new(file.path)
spreadsheet.sheets
=> ["IS", "BS", "CF", "D&A", "WC", "D&I", "EQ"]
spreadsheet.sheet("IS")
=> [returns the "CF" sheet]

I get same result if I use an index number spreadsheet.sheet(0)

@bassjacob
Copy link

Hi @tikitikifoofoo,

I'm not a member of the maintainers or anything, but this seemed interesting since I hadn't run into it before. Can you attach a copy of the spreadsheet you're using? I can't seem to reproduce this error using an XLSX sheet of my own.

@tddrmllr
Copy link
Author

Update: so I think I've narrowed the problem to the Zip::File extraction process. It seems like the order in which the sheets are iterated on in the Zip::File.foreach method is not necessarily the same as the actual order of the sheets in the file. In fact, the order in which they're iterated on seems random, because I get different sheets at different indices almost every time.

I pried into the process_zipfile method on line 491 of excelx.rb. I then took one of the temp paths in the @sheet_files array and dropped it into a browser to view the xml and confirmed that the data in the xml does not correspond to the sheet that the tempfile is supposed to represent. Still trying to understand why this might be.

Here's a link to the file I'm trying to parse: http://www.filedropper.com/operatingmodelhc

@stevendaniels
Copy link
Contributor

I wasn't able to access the link to the file. Can you fork the project and add the file to test/files directory?

@stevendaniels
Copy link
Contributor

I'm closing this issue.

@tikitikifoofoo If you get the chance, please provide a copy of your spreadsheet so I can look into the issue. Creating a gist might be the easiest way.

  1. Create a gist with code that creates the error.
  2. Clone the gist repo locally, add a stripped down version of the offending spreadsheet to the gist repo, and push the changes to the gist repo.
  3. Paste the gist url here.

Here's a sample gist

@tddrmllr
Copy link
Author

Sorry @stevendaniels, I just saw your comments. Here's a gist with the file: https://gist.github.com/tikitikifoofoo/e37872ef6d3fd170008a.

We discovered that the problem only seems to happen with spreadsheets downloaded from Google Sheets, which is what I uploaded to the gist.

@stevendaniels
Copy link
Contributor

Thanks. I did a quick look and confirmed the issue.
Here's what's happening:

  1. The order of Google's sheet files doesn't match the provided in workbook.xml
  2. Google's xlsx spreadsheet archive has a file called [Content_Types].xml That file has XML declarations that override the order of the sheets.
<Override ContentType="..." PartName="/xl/worksheets/sheet5.xml"/>
<Override ContentType="..." PartName="/xl/worksheets/sheet7.xml"/>
<Override ContentType="..." PartName="/xl/worksheets/sheet3.xml"/>
<Override ContentType="..." PartName="/xl/worksheets/sheet1.xml"/>
<Override ContentType="..." PartName="/xl/worksheets/sheet6.xml"/>
<Override ContentType="..." PartName="/xl/worksheets/sheet4.xml"/>
<Override ContentType="..." PartName="/xl/worksheets/sheet2.xml"/>

This is what sets the order of the sheets in the file when opening it Google or Excel. I imagine the sheets in that spreadsheet where created and then moved into a different order and Google preserved the original sheet order.

I'd have to look at other formats, but I suspect that [Content_Types].xml is the authoritative description of what the order of worksheets should be. If that's the case, roo should be ordering sheets based on that data, not the actual number on the sheets.

[update: worksheets should be ordered based on workbook.xml.rels, not [Content_Types].xml]

@stevendaniels stevendaniels reopened this Mar 27, 2015
@piotrb
Copy link

piotrb commented Apr 7, 2015

Hi guys, we've just recently come across this as well. What can we do to help?

@stevendaniels
Copy link
Contributor

I've spent some more time looking into this issue. My previous comment was wrong. [Content_types].xml does not appear to have anything to do with the worksheet order.

It turns out the following two files determine the sheet order: worbook.xml and workbook.xml.rels

workbook.xml

<sheets>
  <sheet state="visible" name="IS" sheetId="1" r:id="rId3"/>
  <sheet state="visible" name="BS" sheetId="2" r:id="rId4"/>
  <sheet state="visible" name="CF" sheetId="3" r:id="rId5"/>
  <sheet state="visible" name="WC" sheetId="4" r:id="rId6"/>
  <sheet state="visible" name="DA" sheetId="5" r:id="rId7"/>
  <sheet state="visible" name="DI" sheetId="6" r:id="rId8"/>
  <sheet state="visible" name="EQ" sheetId="7" r:id="rId9"/>
</sheets>

workbook.xml.rels

<Relationships>
  ...
  <Relationship Id="rId4" Type=".../worksheet" Target="worksheets/sheet5.xml"/>
  <Relationship Id="rId3" Type=".../worksheet" Target="worksheets/sheet4.xml"/>
  <Relationship Id="rId9" Type=".../worksheet" Target="worksheets/sheet3.xml"/>
  <Relationship Id="rId6" Type=".../worksheet" Target="worksheets/sheet2.xml"/>
  <Relationship Id="rId5" Type=".../worksheet" Target="worksheets/sheet1.xml"/>
  <Relationship Id="rId8" Type=".../worksheet" Target="worksheets/sheet6.xml"/>
  <Relationship Id="rId7" Type=".../worksheet" Target="worksheets/sheet7.xml"/>
</Relationships>

According to these relationships, sheet4.xml is the first sheet, i.e. 'IS'. I looked at the raw XML to confirm that sheet4.xml is the first sheet.

In order to fix this issue, Roo::Excelx#process_zipfile will need to be re-written.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants