-
Notifications
You must be signed in to change notification settings - Fork 909
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd: cloud-init query to handle compressed userdata #516
cmd: cloud-init query to handle compressed userdata #516
Conversation
cloudinit/cmd/query.py
Outdated
try: | ||
return util.decomp_gzip(gz_data) | ||
except util.DecompressionError: | ||
return gz_data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if that may pose an issue, but returning the compressed data can cause an issue in a later stage for cloud-init ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question, the entire dictionary of processed meta-data and user-data is pushed through util.json_dumps which has a default serialization function if it can't properly parse a valid json value via json_serialize_default
. So, if we pass through a valid that is not handled by our json_dumps, it gets base64encoded and we add the prefix "ci-b64:" to the value. I've added unit tests to cover these cases.
5520ba3
to
53ab89e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice refactoring of the pytest tests!
cloudinit/cmd/query.py
Outdated
@@ -65,6 +65,21 @@ def get_parser(parser=None): | |||
return parser | |||
|
|||
|
|||
def maybe_decode_decompress_userdata(ud_file_path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit: "maybe" functions always bother me. Could it be something like load_userdata
, or obtain_userdata
?
If you like it how it is though, no need to change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really good point, I just went with other naming conventions of some functions throughout cloud-init, but I tend to agree with you. A function should probably not be named based on what steps it may or may not perform, but what it gives you.
renamed load_userdata
which also saves us a bunch of characters and a couple of userless newlines for formatting :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks good, thank you! I have a few small comments, verging on nits. Please push back on any of them that seem like needless churn.
cloudinit/cmd/tests/test_query.py
Outdated
'ERROR: Missing instance-data file: %s' % json_file, | ||
m_stderr.getvalue()) | ||
run_dir = tmpdir.join('run_dir') | ||
ensure_dir(run_dir.strpath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This (and all the other ensure_dir
calls) could just be run_dir.ensure_dir()
, see https://py.readthedocs.io/en/latest/path.html#py._path.local.LocalPath.ensure_dir.
cloudinit/cmd/tests/test_query.py
Outdated
logs = caplog.text | ||
assert expected_error in logs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logs = caplog.text | |
assert expected_error in logs | |
assert expected_error in caplog.text |
cloudinit/cmd/tests/test_query.py
Outdated
assert 0 == query.handle_args('anyname', args) | ||
out, _err = capsys.readouterr() | ||
cmd_output = json.loads(out) | ||
assert cmd_output['my_var'] == "it worked" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per https://cloudinit.readthedocs.io/en/latest/topics/hacking.html#unit-testing, expected values should go before value under test; I think this applies to the assertions below also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@OddBloke Curious...what's the reason for that rule?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for a rule is consistency: it makes test code easier to read in the case that you aren't just comparing against a static value (as we are here). The reason for it being this way around is arbitrary, really. I've picked it up over the years as a rough "best practice"; some quick searching around suggests that JUnit (which inspired Python's unittest
) takes (or, perhaps, took) parameters in that order, so my guess is that's the source of the order.
The guideline landed as part of the "dump all my unit testing expectations somewhere people other than me can read them" PR that I opened in March during the pytest
migration, so I wouldn't be constantly asking people for test code changes that they could not possibly have predicted. I'd happily drop it, it's of marginal value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think style guide should be followed here just from that consistency point of view. If I can make an assumption during assertion failures about which value is the dynamic and which is static it simplifies my assessment of what broke.
cloudinit/cmd/tests/test_query.py
Outdated
user_data = tmpdir.join('user-data') | ||
vendor_data = tmpdir.join('vendor-data') | ||
user_data.write('ud') | ||
vendor_data.write('vd') | ||
run_dir = tmpdir.join('run_dir') | ||
sensitive_file = run_dir.join(INSTANCE_JSON_SENSITIVE_FILE) | ||
ensure_dir(run_dir.strpath) | ||
sensitive_file.write('{"my-var": "it worked"}') | ||
paths = Paths({'run_dir': run_dir.strpath}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These lines are repeated several times with minor variation; do you think they be a good candidate for a helper function/method of some description?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@OddBloke just pushed a refactor of this into _setup_paths test method .
cloudinit/cmd/tests/test_query.py
Outdated
logs = caplog.text | ||
assert msg in logs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logs = caplog.text | |
assert msg in logs | |
assert msg in caplog.text |
cloudinit/cmd/tests/test_query.py
Outdated
|
||
with_logs = True | ||
@mock.patch("cloudinit.cmd.query.addLogHandlerCLI", return_value="") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you make this:
@mock.patch("cloudinit.cmd.query.addLogHandlerCLI", return_value="") | |
@mock.patch("cloudinit.cmd.query.addLogHandlerCLI", mock.Mock(return_value="")) |
then it won't be passed into every single test, which will clean up the test method signatures a bit. (For the one test you do need it for, you can decorate that specific method.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the suggestion, but I'm not applying it here because I can't mock a mock, in the one test that is trying to validate call_count of m_cli_log. Once the class-level mock is already applied, before the test-level mock of the same object, the test can no longer see the original object call_count so the unit test fails.
If I wanted to use the mock.Mock approach here, I have to decorate each unit test which doesn't reference the patched addLogHandlerCLI separately which ends up being a lot more lines than just a class-level decorator plus an unused param per test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I could have sworn I've done this successfully in the past, but I've just tried it locally and I agree it's not working. Apologies for the wild goose chase! This, however, does work:
diff --git a/cloudinit/cmd/tests/test_query.py b/cloudinit/cmd/tests/test_query.py
index ebe6741c0..acbe57d9a 100644
--- a/cloudinit/cmd/tests/test_query.py
+++ b/cloudinit/cmd/tests/test_query.py
@@ -24,7 +24,7 @@ def _gzip_data(data):
return iobuf.getvalue()
-@mock.patch("cloudinit.cmd.query.addLogHandlerCLI", return_value="")
+@mock.patch("cloudinit.cmd.query.addLogHandlerCLI", lambda *args: "")
class TestQuery:
args = namedtuple(
@@ -32,14 +32,15 @@ class TestQuery:
('debug dump_all format instance_data list_keys user_data vendor_data'
' varname'))
- def test_handle_args_error_on_missing_param(
- self, m_cli_log, caplog, capsys
- ):
+ def test_handle_args_error_on_missing_param(self, caplog, capsys):
"""Error when missing required parameters and print usage."""
args = self.args(
debug=False, dump_all=False, format=None, instance_data=None,
list_keys=False, user_data=None, vendor_data=None, varname=None)
- assert 1 == query.handle_args('anyname', args)
+ with mock.patch(
+ "cloudinit.cmd.query.addLogHandlerCLI", return_value=""
+ ) as m_cli_log:
+ assert 1 == query.handle_args("anyname", args)
expected_error = (
'Expected one of the options: --all, --format, --list-keys'
' or varname\n')
@@ -48,9 +49,7 @@ class TestQuery:
assert 'usage: query' in out
assert 1 == m_cli_log.call_count
- def test_handle_args_error_on_missing_instance_data(
- self, _m_cli_log, caplog, tmpdir
- ):
+ def test_handle_args_error_on_missing_instance_data(self, caplog, tmpdir):
"""When instance_data file path does not exist, log an error."""
absent_fn = tmpdir.join('absent')
args = self.args(
@@ -63,7 +62,7 @@ class TestQuery:
assert msg in caplog.text
def test_handle_args_error_when_no_read_permission_instance_data(
- self, _m_log_cli, caplog, tmpdir
+ self, caplog, tmpdir
):
"""When instance_data file is unreadable, log an error."""
noread_fn = tmpdir.join('unreadable')
@@ -78,9 +77,7 @@ class TestQuery:
msg = "No read permission on '%s'. Try sudo" % noread_fn
assert msg in caplog.text
- def test_handle_args_defaults_instance_data(
- self, _m_log_cli, caplog, tmpdir
- ):
+ def test_handle_args_defaults_instance_data(self, caplog, tmpdir):
"""When no instance_data argument, default to configured run_dir."""
args = self.args(
debug=False, dump_all=True, format=None, instance_data=None,
@@ -95,9 +92,7 @@ class TestQuery:
msg = 'Missing instance-data file: %s' % json_file.strpath
assert msg in caplog.text
- def test_handle_args_root_fallsback_to_instance_data(
- self, _m_log_cli, caplog, tmpdir
- ):
+ def test_handle_args_root_fallsback_to_instance_data(self, caplog, tmpdir):
"""When no instance_data argument, root falls back to redacted json."""
args = self.args(
debug=False, dump_all=True, format=None, instance_data=None,
@@ -135,8 +130,7 @@ class TestQuery:
)
)
def test_handle_args_root_processes_user_data(
- self, _m_log_cli, ud_src, ud_expected, vd_src, vd_expected, capsys,
- tmpdir
+ self, ud_src, ud_expected, vd_src, vd_expected, capsys, tmpdir
):
"""Support reading multiple user-data file content types"""
user_data = tmpdir.join('user-data')
@@ -170,7 +164,7 @@ class TestQuery:
assert vd_expected == cmd_output['vendordata']
def test_handle_args_root_uses_instance_sensitive_data(
- self, _m_cli_log, capsys, tmpdir
+ self, capsys, tmpdir
):
"""When no instance_data argument, root uses sensitive json."""
user_data = tmpdir.join('user-data')
@@ -198,9 +192,7 @@ class TestQuery:
out, _err = capsys.readouterr()
assert expected == out
- def test_handle_args_dumps_all_instance_data(
- self, _m_cli_log, capsys, tmpdir
- ):
+ def test_handle_args_dumps_all_instance_data(self, capsys, tmpdir):
"""When --all is specified query will dump all instance data vars."""
instance_data = tmpdir.join('instance-data')
instance_data.write('{"my-var": "it worked"}')
@@ -220,9 +212,7 @@ class TestQuery:
out, _err = capsys.readouterr()
assert expected == out
- def test_handle_args_returns_top_level_varname(
- self, _m_cli_log, capsys, tmpdir
- ):
+ def test_handle_args_returns_top_level_varname(self, capsys, tmpdir):
"""When the argument varname is passed, report its value."""
instance_data = tmpdir.join('instance-data')
instance_data.write('{"my-var": "it worked"}')
@@ -236,9 +226,7 @@ class TestQuery:
out, _err = capsys.readouterr()
assert 'it worked\n' == out
- def test_handle_args_returns_nested_varname(
- self, _m_cli_log, capsys, tmpdir
- ):
+ def test_handle_args_returns_nested_varname(self, capsys, tmpdir):
"""If user_data file is a jinja template render instance-data vars."""
instance_data = tmpdir.join('instance-data')
instance_data.write(
@@ -255,7 +243,7 @@ class TestQuery:
assert 'value-2\n' == out
def test_handle_args_returns_standardized_vars_to_top_level_aliases(
- self, _m_cli_log, capsys, tmpdir
+ self, capsys, tmpdir
):
"""Any standardized vars under v# are promoted as top-level aliases."""
instance_data = tmpdir.join('instance-data')
@@ -288,7 +276,7 @@ class TestQuery:
assert expected == out
def test_handle_args_list_keys_sorts_top_level_keys_when_no_varname(
- self, _m_cli_log, capsys, tmpdir
+ self, capsys, tmpdir
):
"""Sort all top-level keys when only --list-keys provided."""
instance_data = tmpdir.join('instance-data')
@@ -307,7 +295,7 @@ class TestQuery:
assert expected == out
def test_handle_args_list_keys_sorts_nested_keys_when_varname(
- self, _m_cli_log, capsys, tmpdir
+ self, capsys, tmpdir
):
"""Sort all nested keys of varname object when --list-keys provided."""
instance_data = tmpdir.join('instance-data')
@@ -326,7 +314,7 @@ class TestQuery:
assert expected == out
def test_handle_args_list_keys_errors_when_varname_is_not_a_dict(
- self, _m_cli_log, caplog, tmpdir
+ self, caplog, tmpdir
):
"""Raise an error when --list-keys and varname specify a non-list."""
instance_data = tmpdir.join('instance-data')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
excellent suggestion. applied.
cloudinit/cmd/tests/test_query.py
Outdated
gzfp = gzip.GzipFile(mode="wb", fileobj=iobuf) | ||
gzfp.write(data) | ||
gzfp.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gzfp = gzip.GzipFile(mode="wb", fileobj=iobuf) | |
gzfp.write(data) | |
gzfp.close() | |
with gzip.GzipFile(mode="wb", fileobj=iobuf) as gzfp: | |
gzfp.write(data) |
?
54fb181
to
4172b5f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Chad, this is looking pretty good now! I have a proposed change to the help text, and a couple of suggestions to simplify/improve the testing a little.
cloudinit/cmd/query.py
Outdated
Some instance-data values may be binary on some platforms, such as userdata and | ||
vendordata. Attempt to decompress and decode UTF-8 any binary values. | ||
|
||
Binary instance-data values which cannot be decompressed or decoded, | ||
will be base64-encoded and will have the prefix "ci-b64:" on the value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wording could be a little tighter, I think:
Any binary values in the instance metadata will be base64-encoded and prefixed with "ci-b64:" in the output. userdata and, where applicable, vendordata may be provided to the machine gzip-compressed (and therefore as binary data). query will attempt to decompress these to a string before emitting the JSON output; if this fails, they are treated as binary.
cloudinit/cmd/tests/test_query.py
Outdated
(_gzip_data(b'ud') + b'invalid', 'ci-b64:', | ||
_gzip_data(b'vd') + b'invalid', 'ci-b64:'), | ||
# non-utf-8 encodable content | ||
('hi mom'.encode('utf-16'), 'ci-b64:', | ||
'hi pops'.encode('utf-16'), 'ci-b64:'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Including the full expected output seems pretty achievable given the inputs are pretty small:
(_gzip_data(b'ud') + b'invalid', 'ci-b64:', | |
_gzip_data(b'vd') + b'invalid', 'ci-b64:'), | |
# non-utf-8 encodable content | |
('hi mom'.encode('utf-16'), 'ci-b64:', | |
'hi pops'.encode('utf-16'), 'ci-b64:'), | |
(_gzip_data(b'ud') + b'invalid', | |
'ci-b64:H4sIALIdNF8C/ytNAQANOk5ZAgAAAGludmFsaWQ=', | |
_gzip_data(b'vd') + b'invalid', | |
'ci-b64:H4sIANQdNF8C/ytLAQDOaWNyAgAAAGludmFsaWQ='), | |
# non-utf-8 encodable content | |
('hi mom'.encode('utf-16'), 'ci-b64://5oAGkAIABtAG8AbQA=', | |
'hi pops'.encode('utf-16'), 'ci-b64://5oAGkAIABwAG8AcABzAA=='), |
This would simplify the test implementation as well as making the test a little more thorough.
super(TestQuery, self).setUp() | ||
self.tmp = self.tmp_dir() | ||
self.instance_data = self.tmp_path('instance-data', dir=self.tmp) | ||
def _setup_paths(self, tmpdir, ud_val=None, vd_val=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the tests would be a little simpler if they used the paths
fixture introduced in my Oracle v1 work. I'll comment inline in a couple of places below to describe my proposed changes.
paths, run_dir, _, _ = self._setup_paths(tmpdir) | ||
with mock.patch('cloudinit.cmd.query.read_cfg_paths') as m_paths: | ||
m_paths.return_value = paths | ||
assert 1 == query.handle_args('anyname', args) | ||
json_file = run_dir.join(INSTANCE_JSON_FILE) | ||
msg = 'Missing instance-data file: %s' % json_file.strpath |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this test requested the paths
fixture instead of the tmpdir
fixture, I think it can be rewritten as:
paths, run_dir, _, _ = self._setup_paths(tmpdir) | |
with mock.patch('cloudinit.cmd.query.read_cfg_paths') as m_paths: | |
m_paths.return_value = paths | |
assert 1 == query.handle_args('anyname', args) | |
json_file = run_dir.join(INSTANCE_JSON_FILE) | |
msg = 'Missing instance-data file: %s' % json_file.strpath | |
with mock.patch('cloudinit.cmd.query.read_cfg_paths') as m_paths: | |
m_paths.return_value = paths | |
assert 1 == query.handle_args('anyname', args) | |
json_file = os.path.join(paths.run_dir, INSTANCE_JSON_FILE) | |
msg = 'Missing instance-data file: %s' % json_file |
(Roughly the same also applies to the following test.)
def test_handle_args_root_processes_user_data( | ||
self, ud_src, ud_expected, vd_src, vd_expected, capsys, tmpdir | ||
): | ||
"""Support reading multiple user-data file content types""" | ||
paths, run_dir, user_data, vendor_data = self._setup_paths( | ||
tmpdir, ud_val=ud_src, vd_val=vd_src | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And a similar change could be made here (and in the following test):
def test_handle_args_root_processes_user_data( | |
self, ud_src, ud_expected, vd_src, vd_expected, capsys, tmpdir | |
): | |
"""Support reading multiple user-data file content types""" | |
paths, run_dir, user_data, vendor_data = self._setup_paths( | |
tmpdir, ud_val=ud_src, vd_val=vd_src | |
) | |
def test_handle_args_root_processes_user_data( | |
self, ud_src, ud_expected, vd_src, vd_expected, capsys, paths, tmpdir | |
): | |
"""Support reading multiple user-data file content types""" | |
user_data, vendor_data = self._setup_paths(ud_val=ud_src, vd_val=vd_src) |
It follows that self._setup_paths()
can be simplified to (a) only return user_data
/vendor_data
, and (b) require both of those arguments to be passed in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in this case it makes things a bit more complex to shift to paths as I don't get to use the tmpdir instance methods to write multiple files etc in individual tests so it looks to bloat my current use case here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in this case it makes things a bit more complex to shift to paths as I don't get to use the tmpdir instance methods to write multiple files etc in individual tests so it looks to bloat my current use case here.
Hmm, I'm not sure I follow this, but I would like the paths
fixture to be usable in situations like this. If you can still remember the context here, I'd like to understand what the gap for using it is.
(FYI, pytest
will pass the same tmpdir
instance to a test and its fixtures: so the tmpdir
used by the paths
fixture will be the same as the tmpdir
used within the test. I don't know if that changes anything, but it's useful background info regardless.)
cloud-init query tries to directly load and decode raw user-data from /var/lib/cloud/instance/user-data.txt. This results in UnicodeDecodeErrors on some platforms which provide compressed content. Avoid UnicodeDecoderErrors when parsing compressed user-data at /var/lib/cloud/instance/user-data.txt. LP: #1889938
234ac57
to
314aeae
Compare
314aeae
to
f2d4cbb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few nits in the docstrings. I don't have strong opinions about the pytest fixture discussion from @OddBloke ; I assume that future PRs could address those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great. This looks good to go after CI is green
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This 'load_userdata' seems to take an odd balance of "be explicit" and "use the magic" (load_file with decode=True).
I think it can be simplified if you accept the magic to:
def load_userdata(fpath):
return util.decomp_gzip(
util.load_file(fpath, decode=false).decode()
But without magic this is nicer, and avoids 2 calls to 'load_file':
def load_userdata(fpath):
bdata = util.load_file(fpath, decode=false)
try:
return bdata.decode('utf-8')
except UnicodeDecodeError:
return util.decomp_gzip(bdata, quiet=false, decode=True)
try: | ||
return util.decomp_gzip(encoded_data, quiet=False) | ||
except util.DecompressionError: | ||
return encoded_data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fwiw, it seems silly to try/catch a DecompressionError here as if you called decomp_gzip
with quiet=True thats what you would have gotten.
In short, aren't these 4 lines (93-96) exactly equivalent to: return util.decomp_gzip(encoded_data)
?
Or even shorter:
return util.decomp_gzip(util.load_file(ud_file_path, decode=False), quiet=True)
@returns: String of uncompressed userdata if possible, otherwise bytes. | ||
""" | ||
try: | ||
return util.load_file(ud_file_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first when I saw this, I thought "hm... could a gzipped file contain only unicode?". Luckily the answer was No.
$ python3 -c 'with open("out.gz", "r") as fp: print(fp.read())'
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/lib/python3.8/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
I'm guessing gzip header will always be there and not unicode.
cloud-init query tries to directly load and decode
raw user-data from /var/lib/cloud/instance/user-data.txt.
This results in UnicodeDecodeErrors on some platforms which
provide compressed content.
Avoid UnicodeDecoderErrors when parsing compressed user-data at
/var/lib/cloud/instance/user-data.txt.
LP: #1889938
To reproduce: