Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd: cloud-init query to handle compressed userdata #516

Merged

Conversation

blackboxsw
Copy link
Collaborator

@blackboxsw blackboxsw commented Jul 31, 2020

cloud-init query tries to directly load and decode
raw user-data from /var/lib/cloud/instance/user-data.txt.

This results in UnicodeDecodeErrors on some platforms which
provide compressed content.

Avoid UnicodeDecoderErrors when parsing compressed user-data at
/var/lib/cloud/instance/user-data.txt.

LP: #1889938

To reproduce:

juju bootstrap aws/us-east-1.
juju deploy ubuntu
juju ssh ubuntu/0 
sudo cloud-init query --all

OR 
lxc launch ubuntu-daily:xenial my-test
lxc exec my-test bash
gzip /var/lib/cloud/instance/user-data.txt
mv /var/lib/cloud/instance/user-data.txt.gz /var/lib/cloud/instance/user-data.txt
cloud-init query --all

try:
return util.decomp_gzip(gz_data)
except util.DecompressionError:
return gz_data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if that may pose an issue, but returning the compressed data can cause an issue in a later stage for cloud-init ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, the entire dictionary of processed meta-data and user-data is pushed through util.json_dumps which has a default serialization function if it can't properly parse a valid json value via json_serialize_default. So, if we pass through a valid that is not handled by our json_dumps, it gets base64encoded and we add the prefix "ci-b64:" to the value. I've added unit tests to cover these cases.

@blackboxsw blackboxsw force-pushed the bug/cloud-init-query-compressed-ud branch 12 times, most recently from 5520ba3 to 53ab89e Compare August 3, 2020 22:42
Copy link
Member

@TheRealFalcon TheRealFalcon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice refactoring of the pytest tests!

@@ -65,6 +65,21 @@ def get_parser(parser=None):
return parser


def maybe_decode_decompress_userdata(ud_file_path):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit: "maybe" functions always bother me. Could it be something like load_userdata, or obtain_userdata?

If you like it how it is though, no need to change it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really good point, I just went with other naming conventions of some functions throughout cloud-init, but I tend to agree with you. A function should probably not be named based on what steps it may or may not perform, but what it gives you.
renamed load_userdata which also saves us a bunch of characters and a couple of userless newlines for formatting :)

Copy link
Collaborator

@OddBloke OddBloke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good, thank you! I have a few small comments, verging on nits. Please push back on any of them that seem like needless churn.

'ERROR: Missing instance-data file: %s' % json_file,
m_stderr.getvalue())
run_dir = tmpdir.join('run_dir')
ensure_dir(run_dir.strpath)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (and all the other ensure_dir calls) could just be run_dir.ensure_dir(), see https://py.readthedocs.io/en/latest/path.html#py._path.local.LocalPath.ensure_dir.

cloudinit/cmd/query.py Show resolved Hide resolved
Comment on lines 47 to 48
logs = caplog.text
assert expected_error in logs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logs = caplog.text
assert expected_error in logs
assert expected_error in caplog.text

assert 0 == query.handle_args('anyname', args)
out, _err = capsys.readouterr()
cmd_output = json.loads(out)
assert cmd_output['my_var'] == "it worked"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per https://cloudinit.readthedocs.io/en/latest/topics/hacking.html#unit-testing, expected values should go before value under test; I think this applies to the assertions below also.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OddBloke Curious...what's the reason for that rule?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for a rule is consistency: it makes test code easier to read in the case that you aren't just comparing against a static value (as we are here). The reason for it being this way around is arbitrary, really. I've picked it up over the years as a rough "best practice"; some quick searching around suggests that JUnit (which inspired Python's unittest) takes (or, perhaps, took) parameters in that order, so my guess is that's the source of the order.

The guideline landed as part of the "dump all my unit testing expectations somewhere people other than me can read them" PR that I opened in March during the pytest migration, so I wouldn't be constantly asking people for test code changes that they could not possibly have predicted. I'd happily drop it, it's of marginal value.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think style guide should be followed here just from that consistency point of view. If I can make an assumption during assertion failures about which value is the dynamic and which is static it simplifies my assessment of what broke.

Comment on lines 179 to 187
user_data = tmpdir.join('user-data')
vendor_data = tmpdir.join('vendor-data')
user_data.write('ud')
vendor_data.write('vd')
run_dir = tmpdir.join('run_dir')
sensitive_file = run_dir.join(INSTANCE_JSON_SENSITIVE_FILE)
ensure_dir(run_dir.strpath)
sensitive_file.write('{"my-var": "it worked"}')
paths = Paths({'run_dir': run_dir.strpath})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines are repeated several times with minor variation; do you think they be a good candidate for a helper function/method of some description?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OddBloke just pushed a refactor of this into _setup_paths test method .

cloudinit/cmd/tests/test_query.py Show resolved Hide resolved
Comment on lines 123 to 124
logs = caplog.text
assert msg in logs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logs = caplog.text
assert msg in logs
assert msg in caplog.text


with_logs = True
@mock.patch("cloudinit.cmd.query.addLogHandlerCLI", return_value="")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you make this:

Suggested change
@mock.patch("cloudinit.cmd.query.addLogHandlerCLI", return_value="")
@mock.patch("cloudinit.cmd.query.addLogHandlerCLI", mock.Mock(return_value=""))

then it won't be passed into every single test, which will clean up the test method signatures a bit. (For the one test you do need it for, you can decorate that specific method.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the suggestion, but I'm not applying it here because I can't mock a mock, in the one test that is trying to validate call_count of m_cli_log. Once the class-level mock is already applied, before the test-level mock of the same object, the test can no longer see the original object call_count so the unit test fails.

If I wanted to use the mock.Mock approach here, I have to decorate each unit test which doesn't reference the patched addLogHandlerCLI separately which ends up being a lot more lines than just a class-level decorator plus an unused param per test.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I could have sworn I've done this successfully in the past, but I've just tried it locally and I agree it's not working. Apologies for the wild goose chase! This, however, does work:

diff --git a/cloudinit/cmd/tests/test_query.py b/cloudinit/cmd/tests/test_query.py
index ebe6741c0..acbe57d9a 100644
--- a/cloudinit/cmd/tests/test_query.py
+++ b/cloudinit/cmd/tests/test_query.py
@@ -24,7 +24,7 @@ def _gzip_data(data):
         return iobuf.getvalue()
 
 
-@mock.patch("cloudinit.cmd.query.addLogHandlerCLI", return_value="")
+@mock.patch("cloudinit.cmd.query.addLogHandlerCLI", lambda *args: "")
 class TestQuery:
 
     args = namedtuple(
@@ -32,14 +32,15 @@ class TestQuery:
         ('debug dump_all format instance_data list_keys user_data vendor_data'
          ' varname'))
 
-    def test_handle_args_error_on_missing_param(
-        self, m_cli_log, caplog, capsys
-    ):
+    def test_handle_args_error_on_missing_param(self, caplog, capsys):
         """Error when missing required parameters and print usage."""
         args = self.args(
             debug=False, dump_all=False, format=None, instance_data=None,
             list_keys=False, user_data=None, vendor_data=None, varname=None)
-        assert 1 == query.handle_args('anyname', args)
+        with mock.patch(
+            "cloudinit.cmd.query.addLogHandlerCLI", return_value=""
+        ) as m_cli_log:
+            assert 1 == query.handle_args("anyname", args)
         expected_error = (
             'Expected one of the options: --all, --format, --list-keys'
             ' or varname\n')
@@ -48,9 +49,7 @@ class TestQuery:
         assert 'usage: query' in out
         assert 1 == m_cli_log.call_count
 
-    def test_handle_args_error_on_missing_instance_data(
-        self, _m_cli_log, caplog, tmpdir
-    ):
+    def test_handle_args_error_on_missing_instance_data(self, caplog, tmpdir):
         """When instance_data file path does not exist, log an error."""
         absent_fn = tmpdir.join('absent')
         args = self.args(
@@ -63,7 +62,7 @@ class TestQuery:
         assert msg in caplog.text
 
     def test_handle_args_error_when_no_read_permission_instance_data(
-        self, _m_log_cli, caplog, tmpdir
+        self, caplog, tmpdir
     ):
         """When instance_data file is unreadable, log an error."""
         noread_fn = tmpdir.join('unreadable')
@@ -78,9 +77,7 @@ class TestQuery:
         msg = "No read permission on '%s'. Try sudo" % noread_fn
         assert msg in caplog.text
 
-    def test_handle_args_defaults_instance_data(
-        self, _m_log_cli, caplog, tmpdir
-    ):
+    def test_handle_args_defaults_instance_data(self, caplog, tmpdir):
         """When no instance_data argument, default to configured run_dir."""
         args = self.args(
             debug=False, dump_all=True, format=None, instance_data=None,
@@ -95,9 +92,7 @@ class TestQuery:
         msg = 'Missing instance-data file: %s' % json_file.strpath
         assert msg in caplog.text
 
-    def test_handle_args_root_fallsback_to_instance_data(
-        self, _m_log_cli, caplog, tmpdir
-    ):
+    def test_handle_args_root_fallsback_to_instance_data(self, caplog, tmpdir):
         """When no instance_data argument, root falls back to redacted json."""
         args = self.args(
             debug=False, dump_all=True, format=None, instance_data=None,
@@ -135,8 +130,7 @@ class TestQuery:
         )
     )
     def test_handle_args_root_processes_user_data(
-        self, _m_log_cli, ud_src, ud_expected, vd_src, vd_expected, capsys,
-        tmpdir
+        self, ud_src, ud_expected, vd_src, vd_expected, capsys, tmpdir
     ):
         """Support reading multiple user-data file content types"""
         user_data = tmpdir.join('user-data')
@@ -170,7 +164,7 @@ class TestQuery:
             assert vd_expected == cmd_output['vendordata']
 
     def test_handle_args_root_uses_instance_sensitive_data(
-        self, _m_cli_log, capsys, tmpdir
+        self, capsys, tmpdir
     ):
         """When no instance_data argument, root uses sensitive json."""
         user_data = tmpdir.join('user-data')
@@ -198,9 +192,7 @@ class TestQuery:
         out, _err = capsys.readouterr()
         assert expected == out
 
-    def test_handle_args_dumps_all_instance_data(
-        self, _m_cli_log, capsys, tmpdir
-    ):
+    def test_handle_args_dumps_all_instance_data(self, capsys, tmpdir):
         """When --all is specified query will dump all instance data vars."""
         instance_data = tmpdir.join('instance-data')
         instance_data.write('{"my-var": "it worked"}')
@@ -220,9 +212,7 @@ class TestQuery:
         out, _err = capsys.readouterr()
         assert expected == out
 
-    def test_handle_args_returns_top_level_varname(
-        self, _m_cli_log, capsys, tmpdir
-    ):
+    def test_handle_args_returns_top_level_varname(self, capsys, tmpdir):
         """When the argument varname is passed, report its value."""
         instance_data = tmpdir.join('instance-data')
         instance_data.write('{"my-var": "it worked"}')
@@ -236,9 +226,7 @@ class TestQuery:
         out, _err = capsys.readouterr()
         assert 'it worked\n' == out
 
-    def test_handle_args_returns_nested_varname(
-        self, _m_cli_log, capsys, tmpdir
-    ):
+    def test_handle_args_returns_nested_varname(self, capsys, tmpdir):
         """If user_data file is a jinja template render instance-data vars."""
         instance_data = tmpdir.join('instance-data')
         instance_data.write(
@@ -255,7 +243,7 @@ class TestQuery:
         assert 'value-2\n' == out
 
     def test_handle_args_returns_standardized_vars_to_top_level_aliases(
-        self, _m_cli_log, capsys, tmpdir
+        self, capsys, tmpdir
     ):
         """Any standardized vars under v# are promoted as top-level aliases."""
         instance_data = tmpdir.join('instance-data')
@@ -288,7 +276,7 @@ class TestQuery:
         assert expected == out
 
     def test_handle_args_list_keys_sorts_top_level_keys_when_no_varname(
-        self, _m_cli_log, capsys, tmpdir
+        self, capsys, tmpdir
     ):
         """Sort all top-level keys when only --list-keys provided."""
         instance_data = tmpdir.join('instance-data')
@@ -307,7 +295,7 @@ class TestQuery:
         assert expected == out
 
     def test_handle_args_list_keys_sorts_nested_keys_when_varname(
-        self, _m_cli_log, capsys, tmpdir
+        self, capsys, tmpdir
     ):
         """Sort all nested keys of varname object when --list-keys provided."""
         instance_data = tmpdir.join('instance-data')
@@ -326,7 +314,7 @@ class TestQuery:
         assert expected == out
 
     def test_handle_args_list_keys_errors_when_varname_is_not_a_dict(
-        self, _m_cli_log, caplog, tmpdir
+        self, caplog, tmpdir
     ):
         """Raise an error when --list-keys and varname specify a non-list."""
         instance_data = tmpdir.join('instance-data')

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

excellent suggestion. applied.

Comment on lines 22 to 24
gzfp = gzip.GzipFile(mode="wb", fileobj=iobuf)
gzfp.write(data)
gzfp.close()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
gzfp = gzip.GzipFile(mode="wb", fileobj=iobuf)
gzfp.write(data)
gzfp.close()
with gzip.GzipFile(mode="wb", fileobj=iobuf) as gzfp:
gzfp.write(data)

?

@blackboxsw blackboxsw force-pushed the bug/cloud-init-query-compressed-ud branch from 54fb181 to 4172b5f Compare August 6, 2020 17:10
@blackboxsw blackboxsw requested a review from OddBloke August 6, 2020 17:11
Copy link
Collaborator

@OddBloke OddBloke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Chad, this is looking pretty good now! I have a proposed change to the help text, and a couple of suggestions to simplify/improve the testing a little.

Comment on lines 6 to 10
Some instance-data values may be binary on some platforms, such as userdata and
vendordata. Attempt to decompress and decode UTF-8 any binary values.

Binary instance-data values which cannot be decompressed or decoded,
will be base64-encoded and will have the prefix "ci-b64:" on the value.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wording could be a little tighter, I think:

Any binary values in the instance metadata will be base64-encoded and prefixed with "ci-b64:" in the output. userdata and, where applicable, vendordata may be provided to the machine gzip-compressed (and therefore as binary data). query will attempt to decompress these to a string before emitting the JSON output; if this fails, they are treated as binary.

Comment on lines 147 to 151
(_gzip_data(b'ud') + b'invalid', 'ci-b64:',
_gzip_data(b'vd') + b'invalid', 'ci-b64:'),
# non-utf-8 encodable content
('hi mom'.encode('utf-16'), 'ci-b64:',
'hi pops'.encode('utf-16'), 'ci-b64:'),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Including the full expected output seems pretty achievable given the inputs are pretty small:

Suggested change
(_gzip_data(b'ud') + b'invalid', 'ci-b64:',
_gzip_data(b'vd') + b'invalid', 'ci-b64:'),
# non-utf-8 encodable content
('hi mom'.encode('utf-16'), 'ci-b64:',
'hi pops'.encode('utf-16'), 'ci-b64:'),
(_gzip_data(b'ud') + b'invalid',
'ci-b64:H4sIALIdNF8C/ytNAQANOk5ZAgAAAGludmFsaWQ=',
_gzip_data(b'vd') + b'invalid',
'ci-b64:H4sIANQdNF8C/ytLAQDOaWNyAgAAAGludmFsaWQ='),
# non-utf-8 encodable content
('hi mom'.encode('utf-16'), 'ci-b64://5oAGkAIABtAG8AbQA=',
'hi pops'.encode('utf-16'), 'ci-b64://5oAGkAIABwAG8AcABzAA=='),

This would simplify the test implementation as well as making the test a little more thorough.

super(TestQuery, self).setUp()
self.tmp = self.tmp_dir()
self.instance_data = self.tmp_path('instance-data', dir=self.tmp)
def _setup_paths(self, tmpdir, ud_val=None, vd_val=None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the tests would be a little simpler if they used the paths fixture introduced in my Oracle v1 work. I'll comment inline in a couple of places below to describe my proposed changes.

Comment on lines +111 to +116
paths, run_dir, _, _ = self._setup_paths(tmpdir)
with mock.patch('cloudinit.cmd.query.read_cfg_paths') as m_paths:
m_paths.return_value = paths
assert 1 == query.handle_args('anyname', args)
json_file = run_dir.join(INSTANCE_JSON_FILE)
msg = 'Missing instance-data file: %s' % json_file.strpath
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this test requested the paths fixture instead of the tmpdir fixture, I think it can be rewritten as:

Suggested change
paths, run_dir, _, _ = self._setup_paths(tmpdir)
with mock.patch('cloudinit.cmd.query.read_cfg_paths') as m_paths:
m_paths.return_value = paths
assert 1 == query.handle_args('anyname', args)
json_file = run_dir.join(INSTANCE_JSON_FILE)
msg = 'Missing instance-data file: %s' % json_file.strpath
with mock.patch('cloudinit.cmd.query.read_cfg_paths') as m_paths:
m_paths.return_value = paths
assert 1 == query.handle_args('anyname', args)
json_file = os.path.join(paths.run_dir, INSTANCE_JSON_FILE)
msg = 'Missing instance-data file: %s' % json_file

(Roughly the same also applies to the following test.)

Comment on lines +154 to +160
def test_handle_args_root_processes_user_data(
self, ud_src, ud_expected, vd_src, vd_expected, capsys, tmpdir
):
"""Support reading multiple user-data file content types"""
paths, run_dir, user_data, vendor_data = self._setup_paths(
tmpdir, ud_val=ud_src, vd_val=vd_src
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And a similar change could be made here (and in the following test):

Suggested change
def test_handle_args_root_processes_user_data(
self, ud_src, ud_expected, vd_src, vd_expected, capsys, tmpdir
):
"""Support reading multiple user-data file content types"""
paths, run_dir, user_data, vendor_data = self._setup_paths(
tmpdir, ud_val=ud_src, vd_val=vd_src
)
def test_handle_args_root_processes_user_data(
self, ud_src, ud_expected, vd_src, vd_expected, capsys, paths, tmpdir
):
"""Support reading multiple user-data file content types"""
user_data, vendor_data = self._setup_paths(ud_val=ud_src, vd_val=vd_src)

It follows that self._setup_paths() can be simplified to (a) only return user_data/vendor_data, and (b) require both of those arguments to be passed in.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in this case it makes things a bit more complex to shift to paths as I don't get to use the tmpdir instance methods to write multiple files etc in individual tests so it looks to bloat my current use case here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in this case it makes things a bit more complex to shift to paths as I don't get to use the tmpdir instance methods to write multiple files etc in individual tests so it looks to bloat my current use case here.

Hmm, I'm not sure I follow this, but I would like the paths fixture to be usable in situations like this. If you can still remember the context here, I'd like to understand what the gap for using it is.

(FYI, pytest will pass the same tmpdir instance to a test and its fixtures: so the tmpdir used by the paths fixture will be the same as the tmpdir used within the test. I don't know if that changes anything, but it's useful background info regardless.)

@blackboxsw blackboxsw force-pushed the bug/cloud-init-query-compressed-ud branch from 234ac57 to 314aeae Compare August 20, 2020 20:18
@blackboxsw blackboxsw force-pushed the bug/cloud-init-query-compressed-ud branch from 314aeae to f2d4cbb Compare August 20, 2020 20:47
Copy link
Collaborator

@raharper raharper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few nits in the docstrings. I don't have strong opinions about the pytest fixture discussion from @OddBloke ; I assume that future PRs could address those.

cloudinit/cmd/query.py Outdated Show resolved Hide resolved
cloudinit/cmd/query.py Outdated Show resolved Hide resolved
cloudinit/cmd/query.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@raharper raharper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. This looks good to go after CI is green

@blackboxsw blackboxsw merged commit 747723a into canonical:master Aug 20, 2020
Copy link
Collaborator

@smoser smoser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This 'load_userdata' seems to take an odd balance of "be explicit" and "use the magic" (load_file with decode=True).

I think it can be simplified if you accept the magic to:

def load_userdata(fpath):
    return util.decomp_gzip(
        util.load_file(fpath, decode=false).decode()

But without magic this is nicer, and avoids 2 calls to 'load_file':

def load_userdata(fpath):
    bdata = util.load_file(fpath, decode=false)
    try:
        return bdata.decode('utf-8')
    except UnicodeDecodeError:
        return util.decomp_gzip(bdata, quiet=false, decode=True)

try:
return util.decomp_gzip(encoded_data, quiet=False)
except util.DecompressionError:
return encoded_data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw, it seems silly to try/catch a DecompressionError here as if you called decomp_gzip with quiet=True thats what you would have gotten.

In short, aren't these 4 lines (93-96) exactly equivalent to: return util.decomp_gzip(encoded_data) ?

Or even shorter:

return util.decomp_gzip(util.load_file(ud_file_path, decode=False), quiet=True)

@returns: String of uncompressed userdata if possible, otherwise bytes.
"""
try:
return util.load_file(ud_file_path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first when I saw this, I thought "hm... could a gzipped file contain only unicode?". Luckily the answer was No.

$ python3 -c 'with open("out.gz", "r") as fp: print(fp.read())'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib/python3.8/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
    UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

I'm guessing gzip header will always be there and not unicode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants