Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix worksheet title length enforcement #70

Merged
merged 3 commits into from
Jan 6, 2021
Merged

Fix worksheet title length enforcement #70

merged 3 commits into from
Jan 6, 2021

Conversation

klyonrad
Copy link
Contributor

@klyonrad klyonrad commented Nov 26, 2020

Closes #67

Opposes a redundant test
@@ -663,7 +663,8 @@ def outline(collection, range, level = 1, collapsed = true)

def validate_sheet_name(name)
DataTypeValidator.validate :worksheet_name, String, name
raise ArgumentError, (ERR_SHEET_NAME_TOO_LONG % name) if name.bytesize > 31
character_length = name.encode("utf-16").bytesize / 2 - 1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not completely understand why the - 1 is necessary 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can skip the division and simply use … < 32?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried it out. That would make the tests green, but then the code would look like this

      character_length = name.encode("utf-16").bytesize / 2
      raise ArgumentError, (ERR_SHEET_NAME_TOO_LONG % name) if character_length > 32

the code walk talk as if 32 is the character limit which it is not :/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that Ruby adds a BOM at the beginning of a string when encoding it as UTF-16:

"123".encode("utf-16") # => "\uFEFF123"

The BOM counts for 2 bytes, but is probably irrelevant to Excel’s length restriction. If that’s the case, then something like this should work:

utf16_name = name.encode("utf-16")[1..-1] # ignore first character (BOM) because Excel does so, too.
raise  if utf16_name.bytesize > 64 # or `>= 64`? I’m confused … 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for that investigative work 🕵️ I still use the / 2 because the 31 is pretty valuable to have there written as a number

by the way, using Axlsx::coder.encode (from below) also does not help :D

noniq
noniq previously requested changes Dec 3, 2020
test/workbook/worksheet/tc_worksheet.rb Show resolved Hide resolved
test/workbook/worksheet/tc_worksheet.rb Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
@@ -663,7 +663,8 @@ def outline(collection, range, level = 1, collapsed = true)

def validate_sheet_name(name)
DataTypeValidator.validate :worksheet_name, String, name
raise ArgumentError, (ERR_SHEET_NAME_TOO_LONG % name) if name.bytesize > 31
character_length = name.encode("utf-16").bytesize / 2 - 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can skip the division and simply use … < 32?

Luka Lüdicke added 2 commits December 10, 2020 12:06
- The previous change caused unnecessary issues
- We approximate that Excel calculates the character length with UTF-16
- Fixes #67
@straydogstudio
Copy link
Contributor

@noniq Are we happy with the changes on this pull request?

@straydogstudio
Copy link
Contributor

The expression could remove division and retain some clarity like this:

byte_length = name.encode("utf-16")[1..-1].encode("utf-16").bytesize - 2
raise ArgumentError, (ERR_SHEET_NAME_TOO_LONG % name) if byte_length > 31 * 2

But I don't really see any gains by improving these lines. Certainly performing division by 2 for each sheet name is not a significant performance hit. Since other things are waiting on this and tests are passing, I'm going to merge this request. We can revisit this syntax in another pull request.

@straydogstudio straydogstudio dismissed noniq’s stale review January 6, 2021 04:07

I'm setting this aside for now since tests are passing. We can revisit these lines and improve them in another pull request. Sorry if I'm out of line @noniq.

@straydogstudio straydogstudio merged commit f4d711d into caxlsx:master Jan 6, 2021
@straydogstudio
Copy link
Contributor

Released in 3.0.4

@klyonrad klyonrad deleted the fix-worksheet-title-length-rule branch January 7, 2021 11:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

A character length error in the worksheet name when using Japanese.
3 participants