-
Notifications
You must be signed in to change notification settings - Fork 909
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Oracle IMDSv2 API #528
Conversation
TheRealFalcon
commented
Aug 10, 2020
- v2 of the API is now default with fallback to v1.
- Refactored the Oracle datasource to fetch version, instance, and vnic metadata simultaneously.
@@ -171,10 +105,11 @@ class DataSourceOracle(sources.DataSource): | |||
sources.NetworkConfigSource.system_cfg, | |||
) | |||
|
|||
_network_config = sources.UNSET | |||
_network_config = sources.UNSET # type: dict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wasn't sure how to declare these types. If I don't declare anything, my type checking complains anywhere I use _network_config
as a dict because sources.UNSET
is a string. I still get a complaint on this line because I'm declaring a string as a dict, but it has quieted the rest of the warnings. If we were being strict it would be a Union[dict|str], but since we don't ever actually want to use it as a string, that also seemed wrong to me (and I didn't want to add the union import just for that).
Shrug. Something to think about if we start to incorporate more typing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think sources.UNSET
is going to be a barrier to typing the codebase more generally, so I think we should set it to one side here. I'd probably leave this untyped, rather than applying a not-quite-correct annotation; we'll correct it as part of a broader change.
data = read_opc_metadata() | ||
fetch_vnics_data = self.ds_cfg.get('configure_secondary_nics', False) | ||
# suppress() in this context is just a no-op context manager | ||
network_context = suppress() if _is_iscsi_root( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love using suppress here, but duplicating the read_opc_metadata()
call (especially if it's taking arguments now) seems worse to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed; elsewhere in the codebase (albeit in testing code) we use ExitStack for this, and we import it as
something that indicates how we're using it:
from contextlib import ExitStack as does_not_raise |
What do you think to following that pattern here? (Perhaps as noop
is more apropos here? Or maybe nullcontext
?)
b667c56
to
d44e951
Compare
* v2 of the API is now default with fallback to v1. * Refactored the Oracle datasource to fetch version, instance, and vnic metadata simultaneously.
d44e951
to
3426289
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this, James! This is a complex change, and you've done a great job with it. Comments are inline.
data = read_opc_metadata() | ||
fetch_vnics_data = self.ds_cfg.get('configure_secondary_nics', False) | ||
# suppress() in this context is just a no-op context manager | ||
network_context = suppress() if _is_iscsi_root( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed; elsewhere in the codebase (albeit in testing code) we use ExitStack for this, and we import it as
something that indicates how we're using it:
from contextlib import ExitStack as does_not_raise |
What do you think to following that pattern here? (Perhaps as noop
is more apropos here? Or maybe nullcontext
?)
@@ -171,10 +105,11 @@ class DataSourceOracle(sources.DataSource): | |||
sources.NetworkConfigSource.system_cfg, | |||
) | |||
|
|||
_network_config = sources.UNSET | |||
_network_config = sources.UNSET # type: dict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think sources.UNSET
is going to be a barrier to typing the codebase more generally, so I think we should set it to one side here. I'd probably leave this untyped, rather than applying a not-quite-correct annotation; we'll correct it as part of a broader change.
@@ -247,13 +185,16 @@ def network_config(self): | |||
self._network_config = self.distro.generate_fallback_config() | |||
|
|||
if self.ds_cfg.get('configure_secondary_nics'): | |||
if self._vnics_data == sources.UNSET: | |||
LOG.warning("Network config is UNSET but should not be") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Network config" is a little ambiguous in this context; I had to read this a couple of times to see that this was testing self._vnics_data
, not self._network_config
.
@@ -247,13 +185,16 @@ def network_config(self): | |||
self._network_config = self.distro.generate_fallback_config() | |||
|
|||
if self.ds_cfg.get('configure_secondary_nics'): | |||
if self._vnics_data == sources.UNSET: | |||
LOG.warning("Network config is UNSET but should not be") | |||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure return None
is the right behaviour here: _add_network_config_from_opc_imds
only adds to self._network_config
, so we do have some valid config already determined at this point. If we return None
from .network_config
then _find_networking_config
in stages.py
will move onto considering other sources of network config.
For Oracle, the order in which network config sources will be considered is:
cloud-init/cloudinit/sources/DataSourceOracle.py
Lines 167 to 172 in 6d74815
network_config_sources = ( | |
sources.NetworkConfigSource.cmdline, | |
sources.NetworkConfigSource.ds, | |
sources.NetworkConfigSource.initramfs, | |
sources.NetworkConfigSource.system_cfg, | |
) |
So if we return None
here: if there is initramfs configuration then that will be used, if not then fallback config will be generated (the system_cfg
source is used rarely). That logic is identical to the logic here, but excludes the addition of secondary NICs (which is fine; we know there aren't any if we're executing this line) and the call to _ensure_netfailover_safe
. The message on the commit that introduced _ensure_netfailover_safe
(fa47d52) indicates that this should be called for fallback network configuration, so this is less fine.
(TL;DR: I think we need to skip the addition of the secondary NICs here, rather than aborting processing entirely.)
(2, does_not_raise()), | ||
(3, does_not_raise()), | ||
(4, pytest.raises(UrlError)), | ||
], | ||
) | ||
def test_retries(self, expectation, failure_count, httpretty): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't testing retries on the VNIC route. (I suspect a separate test which just takes the happy path for instance data so it can test VNIC fetching in isolation would be easiest.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having reviewed the whole PR now: I think we're missing fetch_vnics_data
testing more generally: all the tests still pass with this patch applied:
--- a/cloudinit/sources/DataSourceOracle.py
+++ b/cloudinit/sources/DataSourceOracle.py
@@ -305,6 +305,7 @@ def read_opc_metadata(*, fetch_vnics_data: bool = False):
)._response.json()
vnics_data = None
if fetch_vnics_data:
+ assert False
try:
vnics_data = readurl(
METADATA_PATTERN.format(
else: | ||
with dhcp.EphemeralDHCPv4(net.find_fallback_nic()): | ||
data = read_opc_metadata() | ||
fetch_vnics_data = self.ds_cfg.get('configure_secondary_nics', False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We now perform the defaulting of configure_secondary_nics
in two places: here and in .network_config
. It would be preferable to have the default configurable in a single place for ease of maintenance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm don't understand what you mean by 'defaulting of configure_secondary_nics
'. Can you explain it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line determines what the default value of configure_secondary_nics
will be if it isn't present in self.ds_cfg
: False
. In .network_config
, we also determine what the default value of configure_secondary_nics
will be if it isn't present (and in that case as we don't supply a second argument to .get
, it will be None
).
Though, further confusing matters, we actually set the default in BUILTIN_DS_CONFIG
which should mean that configure_secondary_nics
is always set in self.ds_cfg
(so we could use regular dict access instead of .get
). However, that goes through util.mergemanydict
and I don't know for sure that that couldn't result in a dict without the key present under certain user inputs.
Does that make any more sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, mainly because of this
so we could use regular dict access instead of .get
I guess I don't see the problem with using .get
here (or anywhere and everything 😉 ). Since we're not persisting the result of this anywhere, I don't see it as "defaulting a value", but rather eliminating the need for None checks in case something hasn't been set. Why would having things as they are increase the maintenance burden?
values, as it should continue to be configured for DHCP. As such, this | ||
takes an existing network_config dict which is expected to have the | ||
primary NIC configuration already present. | ||
It will mutate the given dict to include the secondary VNICs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still written as if this callable takes a parameter.
# read_opc_metadata should JSON decode the response and return it | ||
expected = json.loads(OPC_V2_METADATA) | ||
assert expected == oracle.read_opc_metadata().instance_data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also be testing the version
attribute of read_opc_metadata
's return here (and in the test below)?
On Tue, Aug 11, 2020 at 05:05:03PM -0700, James Falcon wrote:
@TheRealFalcon commented on this pull request.
> @@ -192,13 +127,20 @@ def _get_data(self):
# network may be configured if iscsi root. If that is the case
# then read_initramfs_config will return non-None.
- if _is_iscsi_root():
- data = read_opc_metadata()
- else:
- with dhcp.EphemeralDHCPv4(net.find_fallback_nic()):
- data = read_opc_metadata()
+ fetch_vnics_data = self.ds_cfg.get('configure_secondary_nics', False)
Yes, mainly because of this
> so we could use regular dict access instead of .get
I guess I don't see the problem with using `.get` here (or anywhere
and everything 😉 ).
My concern here is not with the use of `.get` (though we can have that
conversation at some point ;).
Since we're not persisting the result of this anywhere, I don't see it
as "defaulting a value"
This does declare the default in the case where the key is absent
though; the whole point of using `.get` is that it returns a default
instead of raising a `KeyError` on absence.
but rather eliminating the need for None checks in case something
hasn't been set.
This would also be a problem with None checks; this code is equivalent
to yours (and worse IMO, to be clear!) and is also performing defaulting
in this specific codepath:
```py
fetch_vnics_data = self.ds_cfg.get("configure_secondary_nics")
if fetch_vnics_data is None:
fetch_vnics_data = False
```
Regardless of the exact spelling, the problem is that defaulting is
happening in this codepath as well as in the BUILTIN_DS_CONFIG dict (and
also in `network_config` in the faulty line I introduced).
Why would having things as they are increase the maintenance
burden?
When we want to update the default (which we do intend to do at some
point), we now have three places where we need to update the value,
instead of one. (We had two before your PR, my code is also at fault. :)
To be clear, I think something like
`.get("configure_secondary_nics", BUILTIN_DS_CONFIG["configure_secondary_nics"])`
would do the job.
|
def _mock_v2_urls(httpretty): | ||
def instance_callback(request, uri, response_headers): | ||
assert request.headers.get("Authorization") == "Bearer Oracle" | ||
return [200, {}, OPC_V2_METADATA] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the package build in xenial is failing during the tests: https://travis-ci.com/github/canonical/cloud-init/jobs/371325182
I believe this is because the version of httpretty (0.8.6) in xenial does not include gabrielfalcao/HTTPretty@bdb7ee7 (which was first included in 0.8.11). So I think you'll need to add some default response header values to avoid these tracebacks.
(This isn't caught by the "xenial" unittests in Travis because we don't have HTTPretty pinned correctly; I'll be submitting a PR to address that shortly.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#531 is that PR.
@OddBloke test fixed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the speedy turnaround, James! Just two remaining (minor) areas of concern.
Have you performed any testing of this in Oracle yet? (Do you have access?)
network_context = noop() if _is_iscsi_root( | ||
) else dhcp.EphemeralDHCPv4(net.find_fallback_nic()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't love this continuation; black would format this as:
network_context = (
noop()
if _is_iscsi_root()
else dhcp.EphemeralDHCPv4(net.find_fallback_nic())
)
which is actually more lines than:
if _is_iscsi_root():
network_context = noop()
else:
network_context = dhcp.EphemeralDHCPv4(net.find_fallback_nic())
and you can even golf that down a little further if you want:
network_context = noop()
if not _is_iscsi_root():
network_context = dhcp.EphemeralDHCPv4(net.find_fallback_nic())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(And before you say anything:
In [2]: len(" network_context = noop() if _is_iscsi_root() else dhcp.EphemeralDHCPv4(net.find_fallback_nic())") > 100
Out[2]: True
😁 )
@@ -175,6 +109,7 @@ class DataSourceOracle(sources.DataSource): | |||
|
|||
def __init__(self, sys_cfg, *args, **kwargs): | |||
super(DataSourceOracle, self).__init__(sys_cfg, *args, **kwargs) | |||
self._vnics_data = sources.UNSET |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I'm not being distracted by the type annotation, I've realised that I have this question: is there any reason to use sources.UNSET
over None
here? The use of this attribute is entirely contained to this class, so I don't think we need to follow that (IMO, anti-)pattern.
if self._vnics_data == sources.UNSET: | ||
LOG.warning( | ||
"Secondary NIC data is UNSET but should not be") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think to moving this guard into _add_network_config_from_opc_imds
? As currently written, _add_network_config_from_opc_imds
will behave confusingly if called before self._vnics_data
is set. It seems natural for that method to own its own input validation.
(As things stand, I think it will behave very confusingly: self._vnics_data
will be "_unset"
, so we'll get a TypeError: string indices must be integers
when we attempt to effectively do "u"['.macAddr'].lower()
; if we do switch to self._vnics_data = None
in __init__
then I think we'll get the less-confusing-but-still-avoidable: TypeError: 'NoneType' object is not subscriptable
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with this code now, thanks James! I'm going to wait for you to complete the testing you're doing before landing this.
@OddBloke , I manually checked the following:
If there's nothing else, I'm comfortable with where this is at. |
I can manually kill the v2 endpoint on your test instance if we want to validate failover logic as well. I'll just need the instance id (e.g. ocid1.instance.foo) and a time window to have it disabled. |
@josh-potter Great, thanks. Instance id is ocid1.instance.oc1.phx.anyhqljtniwq6sycegkrokuyq2w27j4rm43i6zvgzphk5our2mle2j2rszrq Time frame can be as soon as works for you. I should only need a couple minutes, but if you need a specific time frame, lets say 30 minutes. If you just let me know when you plan on turning it off I can get to it right away. |
@TheRealFalcon I just need to document the change on our side before executing. That'll take me a little bit to get together. Let's target 2:00 PM EST. I'll leave the v2 endpoint disabled on that instance for 30 minutes. |
@TheRealFalcon the v2 endpoint is now disabled. I'll add it back at 2:30 PM EST or sooner if you get finished up with testing. |
Thanks @josh-potter . I got what I need |
I manually checked the same things I mentioned earlier, except checking for v1 instead of v2. Also, I diff'ed the I'm comfortable to merge. |
@TheRealFalcon Do you want to author the squash commit message, or shall I come up with something? |
@OddBloke I was thinking the title plus first comment. If you think there's more or less we should add, use your creative liberties 😉 "Support Oracle IMDSv2 API
|