Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[nexus][sled-agent] Use IPv4 addresses for Nexus' external interface #1320

Merged
merged 28 commits into from
Jul 7, 2022

Conversation

smklein
Copy link
Collaborator

@smklein smklein commented Jun 30, 2022

As documented by some TODOs, longer-term, we should be using OPTE for Nexus traffic. As a stopgap, however, this PR allocates IPv4 addresses on the underlay.

  • Set up routes to allow traffic to flow from the GZ to nexus' public IP address (I tested out 172.20.15.224)
  • Enable IPv4 routing in the GZ to make sure traffic can reach the external Nexus interface, even if coming from off-device.
  • Enables IPv4 forwarding in the GZ, to reach Nexus

@smklein smklein changed the base branch from main to update-wipe-too June 30, 2022 14:25
@smklein smklein requested a review from bnaecker June 30, 2022 14:28
Base automatically changed from update-wipe-too to main June 30, 2022 15:28
@bnaecker
Copy link
Collaborator

To me and Sean: This is not required. We should rework this to have the sled agent create a new VNIC over the underlay device directly (not the etherstub), and provide it with a static IPv4 address. Because the VNIC is in the zone but over a link in the GZ, they're in the same segment and no routing shenanigans are needed.

@smklein
Copy link
Collaborator Author

smklein commented Jul 6, 2022

I went ahead an updated this PR to:

  • Allocate a VNIC on the physical link for Nexus (we can configure this to be one of the chelsio devices in the lab)
  • Provide a default route from Nexus -> the internet gateway

sled-agent/src/services.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@bnaecker bnaecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems OK, though I'm not sure how we practically use it (if at all) anywhere outside the lab. Are developers supposed to modify the config-rss entry for the external address and gateway to match their local network? We already do that for IP Pools, so it's not the end of the world, but I would say we should add some documentation if that's the case. Updating how-to-run.adoc or similar would be useful.

It's also possible this is only intended for use in the lab or demo environment. In that case, I think we could also be clear about that, making sure folks know not to muck with it. That also suggests we should leave that empty or unset by default, since most people wouldn't want to set up Omicron to use an external IP for Nexus at all.

I'm not sure what the intention is, but seems like we need to clarify this in the docs in either case.

@smklein
Copy link
Collaborator Author

smklein commented Jul 7, 2022

Seems OK, though I'm not sure how we practically use it (if at all) anywhere outside the lab. Are developers supposed to modify the config-rss entry for the external address and gateway to match their local network? We already do that for IP Pools, so it's not the end of the world, but I would say we should add some documentation if that's the case. Updating how-to-run.adoc or similar would be useful.

It's also possible this is only intended for use in the lab or demo environment. In that case, I think we could also be clear about that, making sure folks know not to muck with it. That also suggests we should leave that empty or unset by default, since most people wouldn't want to set up Omicron to use an external IP for Nexus at all.

I'm not sure what the intention is, but seems like we need to clarify this in the docs in either case.

I've done the following:

  • Updated the docs to reference the gateway, and mention the public Nexus interface
  • Disabled the gateway by default

I think this should leave the normal developer flow mostly unaffected - though if Nexus is an IPv4-based interface, it'll need to be sitting somewhere on the public-facing network.

Maybe we can provide a better way to set a bootstrap list of public-facing IPs as a follow-up PR? Seems like we'll need that in the product anyway.

@smklein smklein merged commit a4a0800 into main Jul 7, 2022
@smklein smklein deleted the nexus-on-v4 branch July 7, 2022 14:59
leftwo pushed a commit that referenced this pull request Jun 26, 2024
Added a new package, crucible-dtrace that pulls from buildomat a package
that contains a set of DTrace scripts.  These scripts are extracted into
the global zone at /opt/oxide/crucible_dtrace/

Update Crucible to latest includes these updates:
Clean up dependency checking, fixing space leak (#1372)
Make a DTrace package (#1367)
Use a single context in all messages (#1363)
Remove `DownstairsWork`, because it's redundant (#1371)
Remove `WorkState`, because it's implicit (#1370)
Do work immediately upon receipt of a job, if possible (#1366)
Move 'do work for one job' into a helper function (#1365)
Remove `DownstairsWork` from map when handling it (#1361)
Using `block_in_place` for IO operations (#1357)
update omicron deps; use re-exported dropshot types in oximeter-producer configuration (#1369)
Parameterize more tests (#1364)
Misc cleanup, remove sqlite references. (#1360)
Fix `Extent::close` docstring (#1359)
Make many `Region` functions synchronous (#1356)
Remove `Workstate::Done` (unused) (#1355)
Return a sorted `VecDeque` directly (#1354)
Combine `proc_frame` and `do_work_for` (#1351)
Move `do_work_for` and `do_work` into `ActiveConnection` (#1350)
Support arbitrary Volumes during replace compare (#1349)
Remove the SQLite backend (#1352)
Add a custom timeout for buildomat tests (#1344)
Move `proc_frame` into `ActiveConnection` (#1348)
Remove `UpstairsConnection` from `DownstairsWork` (#1341)
Move Work into ConnectionState (#1340)
Make `ConnectionState` an enum type (#1339)
Parameterize `test_repair.sh` directories (#1345)
Remove `Arc<Mutex<Downstairs>>` (#1338)
Send message to Downstairs directly (#1336)
Consolidate `on_disconnected` and `remove_connection` (#1333)
Move disconnect logic to the Downstairs (#1332)
Remove invalid DTrace probes. (#1335)
Fix outdated comments (#1331)
Use message passing when a new connection starts (#1330)
Move cancellation into Downstairs, using a token to kill IO tasks (#1329)
Make the Downstairs own per-connection state (#1328)
Move remaining local state into a `struct ConnectionState` (#1327)
Consolidate negotiation + IO operations into one loop (#1322)
Allow replacement of a target in a read_only_parent (#1281)
Do all IO through IO tasks (#1321)
Make `reqwest_client` only present if it's used (#1326)
Move negotiation into Downstairs as well (#1320)
Update Rust crate clap to v4.5.4 (#1301)
Reuse a reqwest client when creating Nexus clients (#1317)
Reuse a reqwest client when creating repair client (#1324)
Add % to keep buildomat happy (#1323)
Downstairs task cleanup (#1313)
Update crutest replace test, and mismatch printing. (#1314)
Added more DTrace scripts. (#1309)
Update Rust crate async-trait to 0.1.80 (#1298)
leftwo added a commit that referenced this pull request Jun 26, 2024
Update Crucible and Propolis to the latest

Added a new package, crucible-dtrace that pulls from buildomat a package
that contains a set of DTrace scripts. These scripts are extracted into the 
global zone at /opt/oxide/crucible_dtrace/

Crucible latest includes these updates:
Clean up dependency checking, fixing space leak (#1372) Make a DTrace
package (#1367)
Use a single context in all messages (#1363)
Remove `DownstairsWork`, because it's redundant (#1371) Remove
`WorkState`, because it's implicit (#1370)
Do work immediately upon receipt of a job, if possible (#1366) Move 'do
work for one job' into a helper function (#1365) Remove `DownstairsWork`
from map when handling it (#1361) Using `block_in_place` for IO
operations (#1357)
update omicron deps; use re-exported dropshot types in oximeter-producer
configuration (#1369) Parameterize more tests (#1364)
Misc cleanup, remove sqlite references. (#1360)
Fix `Extent::close` docstring (#1359)
Make many `Region` functions synchronous (#1356)
Remove `Workstate::Done` (unused) (#1355)
Return a sorted `VecDeque` directly (#1354)
Combine `proc_frame` and `do_work_for` (#1351)
Move `do_work_for` and `do_work` into `ActiveConnection` (#1350) Support
arbitrary Volumes during replace compare (#1349) Remove the SQLite
backend (#1352)
Add a custom timeout for buildomat tests (#1344)
Move `proc_frame` into `ActiveConnection` (#1348)
Remove `UpstairsConnection` from `DownstairsWork` (#1341) Move Work into
ConnectionState (#1340)
Make `ConnectionState` an enum type (#1339)
Parameterize `test_repair.sh` directories (#1345)
Remove `Arc<Mutex<Downstairs>>` (#1338)
Send message to Downstairs directly (#1336)
Consolidate `on_disconnected` and `remove_connection` (#1333) Move
disconnect logic to the Downstairs (#1332)
Remove invalid DTrace probes. (#1335)
Fix outdated comments (#1331)
Use message passing when a new connection starts (#1330) Move
cancellation into Downstairs, using a token to kill IO tasks (#1329)
Make the Downstairs own per-connection state (#1328) Move remaining
local state into a `struct ConnectionState` (#1327) Consolidate
negotiation + IO operations into one loop (#1322) Allow replacement of a
target in a read_only_parent (#1281) Do all IO through IO tasks (#1321)
Make `reqwest_client` only present if it's used (#1326) Move negotiation
into Downstairs as well (#1320)
Update Rust crate clap to v4.5.4 (#1301)
Reuse a reqwest client when creating Nexus clients (#1317) Reuse a
reqwest client when creating repair client (#1324) Add % to keep
buildomat happy (#1323)
Downstairs task cleanup (#1313)
Update crutest replace test, and mismatch printing. (#1314) Added more
DTrace scripts. (#1309)
Update Rust crate async-trait to 0.1.80 (#1298)

Propolis just has this one update:
Allow boot order config in propolis-standalone
---------

Co-authored-by: Alan Hanson <alan@oxide.computer>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants