Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User data support #911

Merged
merged 11 commits into from
Apr 15, 2022
Merged

User data support #911

merged 11 commits into from
Apr 15, 2022

Conversation

iliana
Copy link
Contributor

@iliana iliana commented Apr 13, 2022

This is a draft because I haven't run this outside of integration tests yet, but in theory it works! It works! Please go ahead and review.

cloud-init is the usual instance initialization system for cloud images of common Linux distributions. It is a huge ball of Python code that runs at each boot and helps set things up in an expected way. There are two main pieces of data cloud-init looks for on boot: "meta-data", provided by the cloud provider, and "user-data", provided by whoever launched the actual instance.

This PR:

I considered whether to add a user_data field to the Instance struct in omicron_common, but chose against it. I think we want to come up with an alternate means of making user-data accessible from the API rather than add a potentially-long string to the response object for all instance-related actions.

Existing tests that launch instances with the simulated sled-agent seem to exercise these changes, but recommendations for additional tests welcome; I suspect that the full end-to-end "the user-data matches on the actual instance" test will require having a CI system that runs actual VMs.

@iliana iliana requested a review from jmpesp April 13, 2022 23:14
Copy link
Contributor

@jmpesp jmpesp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't tested this yet, but will shortly! Some questions:


impl Instance {
pub fn generate_cidata(&self) -> Result<Vec<u8>, Error> {
// cloud-init meta-data is YAML, but YAML is a strict superset of JSON.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙏

// this was reverse engineered by making the numbers go lower until the
// code failed (usually because of low disk space). this only works for
// FAT12 filesystems
let sectors = 42.max(file_sectors + 35 + ((file_sectors + 1) / 341 * 2));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do the numbers in this calculation represent? my FAT* filesystem ignorant read is "the lowest number of sectors we support is 42, and otherwise we have to compute enough space for FAT* formatting / metadata plus our file data", but that implies that FAT* needs 35 sectors for something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually have no idea because the only reference I could easily find is [[Design of the FAT file system]], which I repeatedly tried to read and my eyes kept falling off the page. I wrote some code using fatfs to find out what the smallest numbers are that work with it.

This more or less spells out to:

  • No device less than 42 sectors can be formatted as FAT12
  • Given the total number of sectors taken up by your files, 35 are overhead
  • ... plus an additional 2 sectors of overhead based on some linear relationship that has numbers I don't understand in it (why is it 341???).

Copy link
Collaborator

@smklein smklein Apr 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend https://www.win.tue.nl/~aeb/linux/fs/fat/fatgen103.pdf as a reference - it is the FAT spec.

Page 14 has a section on FAT type determination:

The FAT type— one of FAT12, FAT16, or FAT32—is determined by the count of clusters on the volume and nothing
else.

Since it seems we're seeing bytes_per_cluster to 512, and the default bytes_per_sector is also 512, the choice of FAT filesystem type seems dependent on "how many clusters we need", which seems dependent on the total input data size.

It also mentions the calculation:

// Calculate the Root Directory Sectors
RootDirSectors = ((BPB_RootEntCnt * 32) + (BPB_BytsPerSec – 1)) / BPB_BytsPerSec;

// Calculate the sectors in the data region
If(BPB_FATSz16 != 0)
  FATSz = BPB_FATSz16;
Else
  FATSz = BPB_FATSz32;
If(BPB_TotSec16 != 0)
  TotSec = BPB_TotSec16;
Else
  TotSec = BPB_TotSec32;
DataSec = TotSec – (BPB_ResvdSecCnt + (BPB_NumFATs * FATSz) + RootDirSectors);

// Calculate the cluster count
CountofClusters = DataSec / BPB_SecPerClus;

// Figure out FAT type
If(CountofClusters < 4085) {
  /* Volume is FAT12 */
} else if(CountofClusters < 65525) {
  /* Volume is FAT16 */
} else {
  /* Volume is FAT32 */
}

I kinda despise this calculation, because to solve the equation for "am I using a FAT12 filesystem", you must first provide input that knows if you are using FAT12/FAT16/FAT32. For example, BPB_RootEntCnt is zero on FAT32, but is not on FAT12 / FAT16.

(This is literally what the fatfs crate does - guess, then check)

They even acknowledge this is kinda fucked up on page 19, under "FAT Volume Initialization":

At this point, the careful reader should have one very interesting question. Given that the FAT type
(FAT12, FAT16, or FAT32) is dependant on the number of clusters—and that the sectors available in
the data area of a FAT volume is dependant on the size of the FAT—when handed an unformatted
volume that does not yet have a BPB, how do you determine all this and compute the proper values to
put in BPB_SecPerClus and either BPB_FATSz16 or BPB_FATSz32? The way Microsoft operating
systems do this is with a fixed value, several tables, and a clever piece of arithmetic.

Microsoft operating systems only do FAT12 on floppy disks. Because there is a limited number of
floppy formats that all have a fixed size, this is done with a simple table:
“If it is a floppy of this type, then the BPB looks like this.”

There is no dynamic computation for FAT12. For the FAT12 formats, all the computation for
BPB_SecPerClus and BPB_FATSz16 was worked out by hand on a piece of paper and recorded in the
table (being careful of course that the resultant cluster count was always less than 4085). If your media
is larger than 4 MB, do not bother with FAT12. Use smaller BPB_SecPerClus values so that the
volume will be FAT16.

So, anyway, "worked out by hand on a piece of paper" aside, we can make some assumptions and try calculating this for a limited size:

  • If we assume we're using FAT12 (which we can validate later),
  • and we assume sector and cluster size are both 512
  • and we assume "max_root_dir_entries" is 512,
  • and we assume "BPB_ResvdSecCnt = 1"
  • and we assume "# of FATs = 2" (we could change this, but that's documented as the default of the rust crate you're using)
RootDirSectors = ((512 * 32) + (512 – 1)) / 512 = 32;
DataSec = TotSec - (1 + 2 * FATSz + 32); 
DataSec = TotSec - (33 + 2 * FATSz)
// The filesystem stays FAT12 if DataSec < 4085

So, anyway, how do we calculate the size of the actual "FAT"s on the FS? With a "simple" calculation (see: determine_sectors_per_fat in fatfs). According to page 16 of the FAT spec, this is "more complicated" for FAT12 because "there are 1.5 bytes (12-bits) per FAT entry".

So, recalling our calculation earlier:

DataSec = TotSec - (33 + 2 * FATSz)
  • If FATSz = 1 => 512 bytes => 4096 bits -> can represent 341 clusters (4096 / 12), or ~170 KiB worth of clusters
  • If FATSz = 2 => 1024 bytes => 8192 bits -> can represent 682 clusters, or ~341 KiB worth of clusters.

So:

TotalSec = 35 + DataSec, if we're storing < 170 KiB of clusters
TotalSec = 37 + DataSec, if we're storing < 341 KiB of clusters

TL:DR:

If we think it's okay to put a bound on user_data + meta_data, my inclination would be:

  • Calculate: file_sectors = user_data.len().div_ceil(512) + meta_data.len().div_ceil(512)
  • Return an error if file_sectors > 512 (implying a combined sum of 256KiB)
  • Expect that 37 sectors will be reserved for overhead
  • Use a disk size of 37 + file_sectors

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though reading your comment earlier, it seems like you experimentally noticed the minimum FS size was 42, no? Any idea where that's coming from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea where the 42 is coming from, no. But if you try to give fatfs fewer than 42 sectors, it gives you an "unfortunate disk size" error.

nexus/src/external_api/params.rs Outdated Show resolved Hide resolved
nexus/src/cidata.rs Outdated Show resolved Hide resolved
@jmpesp
Copy link
Contributor

jmpesp commented Apr 14, 2022

cloud-init meta-data is YAML, but YAML is a strict superset of JSON

beware, YAML carries a terrible curse, but we can use JSON in a FAT filesystem

I actually have no idea

FAT is also cursed but at least we can use it for cloud-init data

it's both at the same time!

normalize_pubkey_data is also cursed

@iliana
Copy link
Contributor Author

iliana commented Apr 14, 2022

yooooo

Welcome to Alpine Linux 3.13
Kernel 5.10.16-0-virt on an x86_64 (/dev/ttyS0)

localhost login: root
Welcome to Alpine!

The Alpine Wiki contains a large amount of how-to guides and general
information about administrating Alpine systems.
See <http://wiki.alpinelinux.org/>.

You can setup the system with the command: setup-alpine

You may change this message by editing /etc/motd.

localhost:~# blkid /dev/vdb
/dev/vdb: LABEL="CIDATA" UUID="1234-5678" TYPE="vfat"
localhost:~# mount /dev/vdb /mnt
localhost:~# ls /mnt
meta-data
localhost:~# cat /mnt/meta-data
{"instance-id":"6de5655c-bd1d-4030-a19d-51118f800282","local-hostname":"suyre","public-keys":""}localhost:~#

nexus/src/lib.rs Outdated Show resolved Hide resolved
@iliana
Copy link
Contributor Author

iliana commented Apr 14, 2022

cloud-init meta-data is YAML, but YAML is a strict superset of JSON

beware, YAML carries a terrible curse, but we can use JSON in a FAT filesystem

the new curse is that YAML 1.2 is the version of the spec that made it a strict superset, and PyYAML (what cloud-init ultimately uses) does not yet have YAML 1.2 support.

this could get interesting

edit: nvm we're probably fine:

>>> import yaml
>>> yaml.safe_load("""{"instance-id":"6de5655c-bd1d-4030-a19d-51118f800282","local-hostname":"suyre","public-keys":""}""")
{'instance-id': '6de5655c-bd1d-4030-a19d-51118f800282', 'local-hostname': 'suyre', 'public-keys': ''}

@iliana
Copy link
Contributor Author

iliana commented Apr 14, 2022

[ 10.766464] cloud-init[578]: Cloud-init v. 20.4.1 finished at Thu, 14 Apr 2022 22:09:45 +0000. Datasource DataSourceNoCloud [seed=/dev/vdb][dsmode=net]. Up 10.76 seconds

🎉

@iliana iliana marked this pull request as ready for review April 14, 2022 22:18
@iliana
Copy link
Contributor Author

iliana commented Apr 14, 2022

I've verified this works on a Debian 11 image with the following user data:

#cloud-config
system_info:
  default_user:
    name: iliana
    plain_text_passwd: "this is my password"
    lock_passwd: false

which correctly set the username and password of the default account, allowing me to log in over the serial console.

[   10.952755] cloud-init[572]: Cloud-init v. 20.4.1 finished at Thu, 14 Apr 2022 22:39:00 +0000. Datasource DataSourceNoCloud [seed=/dev/vdb][dsmode=net].  Up 10.95 seconds
[  OK  ] Finished Execute cloud user/final scripts.
[  OK  ] Reached target Cloud-init target.

Debian GNU/Linux 11 hostname ttyS0

hostname login: iliana
Password: 
Linux hostname 5.10.0-13-amd64 #1 SMP Debian 5.10.106-1 (2022-03-17) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
iliana@hostname:~$

},
"user_data": {
"description": "User data for instance initialization systems (such as cloud-init). Must be a Base64-encoded string, as specified in RFC 4648 § 4 (+ and / characters with padding). Maximum 32 KiB unencoded data.",
"type": "string"
Copy link
Contributor

@david-crespo david-crespo Apr 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we leave this field out of the request and everything still works the same? I see that it's not in the required list. I also don't see the logic that makes that work, but it might be serde magic or dropshot magic or who knows what.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. There's a #[serde(default)] here, so it defaults to an empty string if no data is provided.

(With this change in place there will always be a cloud-init data device attached to an instance with at least the metadata, which is hostname/instance ID/SSH keys.)

let mut disk = Cursor::new(vec![0; sectors * 512]);
fatfs::format_volume(
&mut disk,
FormatVolumeOptions::new()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can explicitly specify that this needs to be FAT12 using .fat_type(FatType::Fat12), if that's necessary to assert our comment above. However, I'm not sure it's safe to assume we'll be using FAT12 - I think that's dependent on the size of userdata + metadata.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit; Unless we think it's okay to impose an upper size bound, then setting the fat type would be totally reasonable here.

// this was reverse engineered by making the numbers go lower until the
// code failed (usually because of low disk space). this only works for
// FAT12 filesystems
let sectors = 42.max(file_sectors + 35 + ((file_sectors + 1) / 341 * 2));
Copy link
Collaborator

@smklein smklein Apr 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend https://www.win.tue.nl/~aeb/linux/fs/fat/fatgen103.pdf as a reference - it is the FAT spec.

Page 14 has a section on FAT type determination:

The FAT type— one of FAT12, FAT16, or FAT32—is determined by the count of clusters on the volume and nothing
else.

Since it seems we're seeing bytes_per_cluster to 512, and the default bytes_per_sector is also 512, the choice of FAT filesystem type seems dependent on "how many clusters we need", which seems dependent on the total input data size.

It also mentions the calculation:

// Calculate the Root Directory Sectors
RootDirSectors = ((BPB_RootEntCnt * 32) + (BPB_BytsPerSec – 1)) / BPB_BytsPerSec;

// Calculate the sectors in the data region
If(BPB_FATSz16 != 0)
  FATSz = BPB_FATSz16;
Else
  FATSz = BPB_FATSz32;
If(BPB_TotSec16 != 0)
  TotSec = BPB_TotSec16;
Else
  TotSec = BPB_TotSec32;
DataSec = TotSec – (BPB_ResvdSecCnt + (BPB_NumFATs * FATSz) + RootDirSectors);

// Calculate the cluster count
CountofClusters = DataSec / BPB_SecPerClus;

// Figure out FAT type
If(CountofClusters < 4085) {
  /* Volume is FAT12 */
} else if(CountofClusters < 65525) {
  /* Volume is FAT16 */
} else {
  /* Volume is FAT32 */
}

I kinda despise this calculation, because to solve the equation for "am I using a FAT12 filesystem", you must first provide input that knows if you are using FAT12/FAT16/FAT32. For example, BPB_RootEntCnt is zero on FAT32, but is not on FAT12 / FAT16.

(This is literally what the fatfs crate does - guess, then check)

They even acknowledge this is kinda fucked up on page 19, under "FAT Volume Initialization":

At this point, the careful reader should have one very interesting question. Given that the FAT type
(FAT12, FAT16, or FAT32) is dependant on the number of clusters—and that the sectors available in
the data area of a FAT volume is dependant on the size of the FAT—when handed an unformatted
volume that does not yet have a BPB, how do you determine all this and compute the proper values to
put in BPB_SecPerClus and either BPB_FATSz16 or BPB_FATSz32? The way Microsoft operating
systems do this is with a fixed value, several tables, and a clever piece of arithmetic.

Microsoft operating systems only do FAT12 on floppy disks. Because there is a limited number of
floppy formats that all have a fixed size, this is done with a simple table:
“If it is a floppy of this type, then the BPB looks like this.”

There is no dynamic computation for FAT12. For the FAT12 formats, all the computation for
BPB_SecPerClus and BPB_FATSz16 was worked out by hand on a piece of paper and recorded in the
table (being careful of course that the resultant cluster count was always less than 4085). If your media
is larger than 4 MB, do not bother with FAT12. Use smaller BPB_SecPerClus values so that the
volume will be FAT16.

So, anyway, "worked out by hand on a piece of paper" aside, we can make some assumptions and try calculating this for a limited size:

  • If we assume we're using FAT12 (which we can validate later),
  • and we assume sector and cluster size are both 512
  • and we assume "max_root_dir_entries" is 512,
  • and we assume "BPB_ResvdSecCnt = 1"
  • and we assume "# of FATs = 2" (we could change this, but that's documented as the default of the rust crate you're using)
RootDirSectors = ((512 * 32) + (512 – 1)) / 512 = 32;
DataSec = TotSec - (1 + 2 * FATSz + 32); 
DataSec = TotSec - (33 + 2 * FATSz)
// The filesystem stays FAT12 if DataSec < 4085

So, anyway, how do we calculate the size of the actual "FAT"s on the FS? With a "simple" calculation (see: determine_sectors_per_fat in fatfs). According to page 16 of the FAT spec, this is "more complicated" for FAT12 because "there are 1.5 bytes (12-bits) per FAT entry".

So, recalling our calculation earlier:

DataSec = TotSec - (33 + 2 * FATSz)
  • If FATSz = 1 => 512 bytes => 4096 bits -> can represent 341 clusters (4096 / 12), or ~170 KiB worth of clusters
  • If FATSz = 2 => 1024 bytes => 8192 bits -> can represent 682 clusters, or ~341 KiB worth of clusters.

So:

TotalSec = 35 + DataSec, if we're storing < 170 KiB of clusters
TotalSec = 37 + DataSec, if we're storing < 341 KiB of clusters

TL:DR:

If we think it's okay to put a bound on user_data + meta_data, my inclination would be:

  • Calculate: file_sectors = user_data.len().div_ceil(512) + meta_data.len().div_ceil(512)
  • Return an error if file_sectors > 512 (implying a combined sum of 256KiB)
  • Expect that 37 sectors will be reserved for overhead
  • Use a disk size of 37 + file_sectors

@iliana iliana merged commit f91a12b into main Apr 15, 2022
@iliana iliana deleted the cloud-init branch April 15, 2022 22:47
leftwo pushed a commit that referenced this pull request Oct 4, 2023
Crucible updates
    all Crucible connections should set TCP_NODELAY (#983)
    Use a fixed size for tag and nonce (#957)
    Log crucible opts on start, order crutest options (#974)
    Lock the Downstairs less (#966)
    Cache dirty flag locally, reducing SQLite operations (#970)
    Make stats mutex synchronous (#961)
    Optimize requeue during flow control conditions (#962)
    Update Rust crate base64 to 0.21.4 (#950)
    Do less in control (#949)
    Fix --flush-per-blocks (#959)
    Fast dependency checking (#916)
    Update actions/checkout action to v4 (#960)
    Use `cargo hakari` for better workspace deps (#956)
    Update actions/checkout digest to 8ade135 (#939)
    Cache block size in Guest (#947)
    Update Rust crate ringbuffer to 0.15.0 (#954)
    Update Rust crate toml to 0.8 (#955)
    Update Rust crate reedline to 0.24.0 (#953)
    Update Rust crate libc to 0.2.148 (#952)
    Update Rust crate indicatif to 0.17.7 (#951)
    Remove unused async (#943)
    Use a synchronous mutex for bw/iop_tokens (#946)
    Make flush ID non-locking (#945)
    Use `oneshot` channels instead of `mpsc` for notification (#918)
    Use a strong type for upstairs negotiation (#941)
    Add a "dynamometer" option to crucible-downstairs (#931)
    Get new work and active count in one lock (#938)
    A bunch of misc test cleanup stuff (#937)
    Wait for a snapshot to finish on all downstairs (#920)
    dsc and clippy cleanup. (#935)
    No need to sort ackable_work (#934)
    Use a strong type for repair ID (#928)
    Keep new jobs sorted (#929)
    Remove state_count function on Downstairs (#927)
    Small cleanup to IOStateCount (#932)
    let cmon and IOStateCount use ClientId (#930)
    Fast return for zero length IOs (#926)
    Use a strong type for client ID (#925)
    A few Crucible Agent fixes (#922)
    Use a newtype for `JobId` (#919)
    Don't pass MutexGuard into functions (#917)
    Crutest updates, rename tests, new options (#911)

Propolis updates
    Update tungstenite crates to 0.20
    Use `strum` crate for enum-related utilities
    Wire up bits for CPUID customization
    PHD: improve artifact store (#529)
    Revert abort-on-panic in 'dev' cargo profile
leftwo added a commit that referenced this pull request Oct 5, 2023
Crucible updates
    all Crucible connections should set TCP_NODELAY (#983)
    Use a fixed size for tag and nonce (#957)
    Log crucible opts on start, order crutest options (#974)
    Lock the Downstairs less (#966)
    Cache dirty flag locally, reducing SQLite operations (#970)
    Make stats mutex synchronous (#961)
    Optimize requeue during flow control conditions (#962)
    Update Rust crate base64 to 0.21.4 (#950)
    Do less in control (#949)
    Fix --flush-per-blocks (#959)
    Fast dependency checking (#916)
    Update actions/checkout action to v4 (#960)
    Use `cargo hakari` for better workspace deps (#956)
    Update actions/checkout digest to 8ade135 (#939)
    Cache block size in Guest (#947)
    Update Rust crate ringbuffer to 0.15.0 (#954)
    Update Rust crate toml to 0.8 (#955)
    Update Rust crate reedline to 0.24.0 (#953)
    Update Rust crate libc to 0.2.148 (#952)
    Update Rust crate indicatif to 0.17.7 (#951)
    Remove unused async (#943)
    Use a synchronous mutex for bw/iop_tokens (#946)
    Make flush ID non-locking (#945)
    Use `oneshot` channels instead of `mpsc` for notification (#918)
    Use a strong type for upstairs negotiation (#941)
    Add a "dynamometer" option to crucible-downstairs (#931)
    Get new work and active count in one lock (#938)
    A bunch of misc test cleanup stuff (#937)
    Wait for a snapshot to finish on all downstairs (#920)
    dsc and clippy cleanup. (#935)
    No need to sort ackable_work (#934)
    Use a strong type for repair ID (#928)
    Keep new jobs sorted (#929)
    Remove state_count function on Downstairs (#927)
    Small cleanup to IOStateCount (#932)
    let cmon and IOStateCount use ClientId (#930)
    Fast return for zero length IOs (#926)
    Use a strong type for client ID (#925)
    A few Crucible Agent fixes (#922)
    Use a newtype for `JobId` (#919)
    Don't pass MutexGuard into functions (#917)
    Crutest updates, rename tests, new options (#911)

Propolis updates
    Update tungstenite crates to 0.20
    Use `strum` crate for enum-related utilities
    Wire up bits for CPUID customization
    PHD: improve artifact store (#529)
    Revert abort-on-panic in 'dev' cargo profile

---------

Co-authored-by: Alan Hanson <alan@oxide.computer>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants