Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ec2): multipart user data #11843

Merged
merged 7 commits into from
Mar 8, 2021
Merged

feat(ec2): multipart user data #11843

merged 7 commits into from
Mar 8, 2021

Conversation

rsmogura
Copy link
Contributor

@rsmogura rsmogura commented Dec 2, 2020

Add support for multiparat (MIME) user data for Linux environments. This
type is more versatile type of user data, and some AWS service
(i.e. AWS Batch) requires it in order to customize the launch
behaviour.

Change was tested in integ environment to check if all
user data parts has been executed correctly and with proper charset
encoding.

fixes #8315


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@gitpod-io
Copy link

gitpod-io bot commented Dec 2, 2020

@github-actions github-actions bot added the @aws-cdk/aws-ec2 Related to Amazon Elastic Compute Cloud label Dec 2, 2020
@rsmogura
Copy link
Contributor Author

rsmogura commented Dec 3, 2020

I think this has to be work in progress for a moment (soft review is welcome) - I analysed integration test results and the approach with auto generating boundary is not perfect when the token is used inside part body, it can lead to different hashes used as boundary and change of user data in turn.

@rix0rrr
Copy link
Contributor

rix0rrr commented Dec 14, 2020

I don't really understand the design.

Why isn't it just a self-contained MimeUserData extends UserData class? Why do we have to turn original UserData class into an IMultipartUserDataPartProducer ?

And generally the boundary is just a large pseudorandom number which we'll assume won't occur in the various parts. Getting a hash from the part contents seems unnecessarily complicated?

Let's start by defining what parts there are. I assume you'll have a "commands" part (the "old" UserData, so to speak) and then a bunch of "attachments", right? Which are basically just files to stick somewhere on the filesystem?

@rsmogura
Copy link
Contributor Author

rsmogura commented Dec 14, 2020

SeeHi Ricco,

I got your point. I'll try to clarify some "whys" so maybe it will be better understable and we could go with this or something better.

Let me start from last part.

Multipart User Data is more like archive than a message with attachments.

Each script is a separate part (attachment), executed during different phases. Many scripts can be executed for the same phase.

(There's more kinds of attachments than script types).

Thus I thought that the Multipart has to inherit from UserData, but it doesn't allow adding commands (to what part or type? it's just archive)

In order to add parts and use existing classes (LinuxUserData)
there's IMultipartUserDataPartProducer. Part requires additional attributes to be rendered (like type), besides body. And there's at least two hooks where scripts can be added.

This allows reusing the current command like user data for Multipart user data.


Let me give a use case:
In Multipart user data I want to have two part with following content-types

cloud-boothook - to reconfigure docker and increase size of docker volume
x-shellscript - to I.e. register in ECS or system manager

So in this case Multipart will be archive for shell scripts, executed by cloud unit.

(Worth of mentioning but not implementing now is that there's more types supported by cloud init: like part handlers, include url, and more - in future every such type could get own class implementing IMultipartUserDataPartProducer)

So that's very nice to have good design review.

@rix0rrr
Copy link
Contributor

rix0rrr commented Jan 6, 2021

This allows reusing the current command like user data for Multipart user data.

I see. I don't mind that at all, I think you are right. But then we still have a couple of choices:

UserData implements IMultipart

image

The different UserData implementations that can be reused implement IMultipart

image

There's an integration class that implements IMultipart

image


Out of these, I think the last one is most self-contained, so has my preference. There might be some syntactic overhead there, but we can hide that with convenience methods:

multipart.addUserDataPart(new LinuxUserData());

class MultipartUserData {
  public addPart(part: IMultipart) { ... }

  public addUserDataPart(userData: UserData) {
    this.addPart(new UserDataMultipart(userData));
  }
}

That way we don't need to pollute the existing UserData classes with multipart-specifics.

@rsmogura
Copy link
Contributor Author

rsmogura commented Jan 7, 2021

Agree. The last idea is really good, instead of bloating code with interfaces just use adaptor.

And later we could think how to solve separator issue - however fallback will be to specify separator as option to Multipart.

@rsmogura
Copy link
Contributor Author

Hi @rix0rrr can you check now?

rix0rrr
rix0rrr previously requested changes Feb 9, 2021
Copy link
Contributor

@rix0rrr rix0rrr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like where this is going!

I have some requests around naming and simplifying the implementation a little, but otherwise great job!

Also looking for a real life example of using more than 1 part :)

packages/@aws-cdk/aws-ec2/lib/user-data.ts Outdated Show resolved Hide resolved
packages/@aws-cdk/aws-ec2/lib/user-data.ts Outdated Show resolved Hide resolved
packages/@aws-cdk/aws-ec2/lib/user-data.ts Outdated Show resolved Hide resolved
packages/@aws-cdk/aws-ec2/lib/user-data.ts Outdated Show resolved Hide resolved
packages/@aws-cdk/aws-ec2/lib/user-data.ts Outdated Show resolved Hide resolved
packages/@aws-cdk/aws-ec2/lib/user-data.ts Outdated Show resolved Hide resolved
packages/@aws-cdk/aws-ec2/lib/user-data.ts Outdated Show resolved Hide resolved
packages/@aws-cdk/aws-ec2/lib/user-data.ts Outdated Show resolved Hide resolved
packages/@aws-cdk/aws-ec2/lib/user-data.ts Show resolved Hide resolved
packages/@aws-cdk/aws-ec2/lib/user-data.ts Outdated Show resolved Hide resolved
@rsmogura
Copy link
Contributor Author

Thank you. And sorry for late answer I missed notification.

I'll take a look at it.

I think real life example could be setting size for Docker container disk in batch environment:

https://aws.amazon.com/premiumsupport/knowledge-center/batch-job-failure-disk-space/

@yashda
Copy link

yashda commented Feb 25, 2021

We have a similar requirement which is another real life example of using more than 1 part:

  1. Configure the output/log file for cloud-init using cloud-config directives on CentOS images.
  2. Execute user-data script(bash) on startup of the Instance.

These changes would help us a lot in simplifying the user-data handling.

@rsmogura
Copy link
Contributor Author

rsmogura commented Mar 1, 2021

I want to test it with Windows machines, too. I'm bit concerned about line ending characters.

Radek Smogura and others added 4 commits March 4, 2021 19:47
Add support for multiparat (MIME) user data for Linux environments. This
type is more versatile type of user data, and some AWS service
(i.e. AWS Batch) requires it in order to customize the launch
behaviour.

Change was tested in integ environment to check if all
user data parts has been executed correctly and with proper charset
encoding.

fixes aws#8315
* Remove `IMultipartUserDataPartProducer`
* Add `MultipartUserDataPart` & `IMultipart`
* Concrete types to represent raw part and UserData wrapper can be created with
`MultipartUserDataPart.fromUserData` & `MultipartUserDataPart.fromRawBody`
* Removed auto-generation of separator (as with tokens hash codes can differ when tokens are not resolved)
- remove `MultipartContentType`
- remove `MultipartUserDataPartWrapperOptions`
- remove `IMultipart`
- rename `MultipartUserDataPart` -> `MultipartBody`
- other removals
- restructure other classes
- moved part rendering to part class
- set default separator to hard codeded string
- added validation of boundry
@mergify mergify bot dismissed rix0rrr’s stale review March 4, 2021 19:04

Pull request has been modified.

@rsmogura
Copy link
Contributor Author

rsmogura commented Mar 4, 2021

Hi @rix0rrr

I made a bigger refactor of code, I removed few methods, and followed tips. I think it's more clean right now, please check.

In context of Windows machines, I think multipart is not supported for Windows (at least I could not run any kind of multipart data on Windows, and documentation as well does not mention that multipart is supported for Windows EC2).


Minor thing which concerns me. MIME RFC as it's telnet based suggests CRLF as new line, however cloud-init is fine with \n (I made number of tests here, however I think it's worth to call it out)

rix0rrr
rix0rrr previously requested changes Mar 8, 2021
Copy link
Contributor

@rix0rrr rix0rrr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This is looking great. Just some small tweaks.

protected static readonly DEFAULT_CONTENT_TYPE = 'text/x-shellscript; charset="utf-8"';

/** The body of this MIME part. */
public abstract get body(): string | undefined;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd prefer you don't declare these abstract members, and just implement renderBodyPart() twice--once for the two concrete types of classes.

It may feel like unnecessary duplication, but the actual amount of duplication won't be that bad and we'll have a good reduction in case analysis too (fewer ifs).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I got the point. I thought that main concern was about options interfaces, so I moved towards abstracts getters and template method pattern.

Thanks, that's a good comment.


return this;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why get rid of the addUserDataPart(userData, contentType) here? I rather liked it as a convenience method.

(Of course it's not strictly necessary, but it reads nicer than the current alternative)

@mergify mergify bot dismissed rix0rrr’s stale review March 8, 2021 13:20

Pull request has been modified.

@rix0rrr rix0rrr changed the title feat(ec2): introduce multipart user data feat(ec2): multipart user data Mar 8, 2021
@mergify
Copy link
Contributor

mergify bot commented Mar 8, 2021

Thank you for contributing! Your pull request will be updated from master and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildProject89A8053A-LhjRyN9kxr8o
  • Commit ID: bc6e06c
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mergify
Copy link
Contributor

mergify bot commented Mar 8, 2021

Thank you for contributing! Your pull request will be updated from master and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

This was referenced Mar 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-ec2 Related to Amazon Elastic Compute Cloud
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support MIME multipart format in UserData
5 participants