Skip to content

Commit

Permalink
don't encode sms when calculating content length
Browse files Browse the repository at this point in the history
we were previously encoding the utf-8 content, and measuring the number
of bytes. We've been doing this since the beginning of time, and for
most of time, it wasn't an issue. However, when we started accepting
unicode characters (notably some welsh-only accents eg Ŵ or ŷ), they are
encoded as two bytes rather than 1. So a message of 70 ŷ characters is
140 bytes long, but a message of 69 a characters and one ŷ is 71
characters long.

However, in reality, when we send a message, if _any_ character in it is
non-gsm (similar but not the same as ascii), then the entire thing is
encoded in UTF-16, where every single character (including basic latin
text) is two bytes long. So both ŷŷŷŷ... and aaaaa....aŷ are actually 70
utf-16 characters - we already take account of the double code point
width when determining billable fragments.

This means in a small amount of cases (where someone has a message that
is almost at a message boundary in size, which contains a lot of welsh
characters), we've over-billed that service. But this is probably an
incredibly low amount of messages, and impossible to account for after 1
wk, so probably not worth worrying about.
  • Loading branch information
leohemsted committed Nov 18, 2019
1 parent 1ab5a44 commit aae1c82
Show file tree
Hide file tree
Showing 3 changed files with 13 additions and 4 deletions.
11 changes: 9 additions & 2 deletions notifications_utils/template.py
Original file line number Diff line number Diff line change
Expand Up @@ -166,12 +166,19 @@ def prefix(self, value):

@property
def content_count(self):
return len((
"""
Return the number of characters in the message. Note that we don't distinguish between GSM and non-GSM
characters at this point, as `get_sms_fragment_count` handles that separately.
Also note that if values aren't provided, will calculate the raw length of the unsubstituted placeholders,
as in the message `foo ((placeholder))` has a length of 19.
"""
return len(
# we always want to call SMSMessageTemplate.__str__ regardless of subclass, to avoid any html formatting
SMSMessageTemplate.__str__(self)
if self._values
else sms_encode(add_prefix(self.content.strip(), self.prefix))
).encode(self.encoding))
)

@property
def fragment_count(self):
Expand Down
2 changes: 1 addition & 1 deletion notifications_utils/version.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = '36.1.1'
__version__ = '36.1.2'
4 changes: 3 additions & 1 deletion tests/test_base_template.py
Original file line number Diff line number Diff line change
Expand Up @@ -119,8 +119,10 @@ def test_extracting_placeholders(template_content, template_subject, expected):
"content,prefix, expected_length, expected_replaced_length",
[
("The quick brown fox jumped over the lazy dog", None, 44, 44),
# should be replaced with a ?
# is an unsupported unicode character so should be replaced with a ?
("深", None, 1, 1),
# is a supported unicode character so should be kept as is
("Ŵ", None, 1, 1),
("'First line.\n", None, 12, 12),
("\t\n\r", None, 0, 0),
("((placeholder))", None, 15, 3),
Expand Down

0 comments on commit aae1c82

Please sign in to comment.