Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initialize workspaces with additional file content #4428

Merged
merged 2 commits into from
Jun 15, 2021

Conversation

svenefftinge
Copy link
Member

@svenefftinge svenefftinge commented Jun 8, 2021

@svenefftinge svenefftinge force-pushed the se/external-config branch 3 times, most recently from f3076b8 to 490fd65 Compare June 9, 2021 10:55
@svenefftinge
Copy link
Member Author

svenefftinge commented Jun 9, 2021

/werft run

👍 started the job as gitpod-build-se-external-config.9

@svenefftinge svenefftinge force-pushed the se/external-config branch 3 times, most recently from 97ea581 to 221da19 Compare June 9, 2021 13:20
@svenefftinge svenefftinge changed the title [WIP] Initialize workspaces with additional file content Initialize workspaces with additional file content Jun 9, 2021
@svenefftinge svenefftinge force-pushed the se/external-config branch 2 times, most recently from 35886e4 to 1959053 Compare June 9, 2021 16:07
@svenefftinge
Copy link
Member Author

svenefftinge commented Jun 10, 2021

/werft run

👍 started the job as gitpod-build-se-external-config.17

@svenefftinge svenefftinge force-pushed the se/external-config branch 3 times, most recently from 0234ebf to 9bdd722 Compare June 10, 2021 09:19
@svenefftinge svenefftinge marked this pull request as ready for review June 10, 2021 09:42
}
if (!customConfig) {

// try and find config file in the context repo or remote in
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the reviewers: section below hasn't changed besides the indent because of the if.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tip: Adding ?w=1 at the end of the review URL ignores all whitespace changes (i.e. re-indented blocks will appear unchanged).

const hostContext = this.hostContextProvider.get(context.repository.host);
if (!hostContext || !hostContext.services) {
throw new Error(`Cannot fetch workspace image source for host: ${context.repository.host}`);
}
const lastDockerFileSha = await hostContext.services.fileProvider.getLastChangeRevision(context.repository, context.revision, user, imgcfg.file);
const lastDockerFileShaPromise = hostContext.services.fileProvider.getLastChangeRevision(context.repository, context.revision, user, imgcfg.file);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to make two requests for the time being, to support the current resolution of image refs. Once we have rebuilt most images, we can remove the lastChangeRevision part.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we resolved the base image here already we could check which case (lastDockerFileSha or this.getContentSHA(dockerFileContent)) resolves to a built image. This behaviour is what I referred to as "move the legacy behaviour to server" above.

@csweichel
Copy link
Contributor

How are we dealing with environment variables and tasks? Right now the env vars assume that the workspace their pattern matches for can be trusted. With this change if I can make someone click on a link I can quite easily get their secrets because I can bring my own tasks.

@svenefftinge
Copy link
Member Author

How are we dealing with environment variables and tasks? Right now the env vars assume that the workspace their pattern matches for can be trusted. With this change if I can make someone click on a link I can quite easily get their secrets because I can bring my own tasks.

I think this is a general problem that exists with all our workspace start URLs. The underlying issue is that supervisor runs processes and it is not obvious what those processes will be from looking at a link. I suggest we introduce something where we ask users if they trust certain sources to allow task execution.

@csweichel
Copy link
Contributor

csweichel commented Jun 11, 2021

How are we dealing with environment variables and tasks? Right now the env vars assume that the workspace their pattern matches for can be trusted. With this change if I can make someone click on a link I can quite easily get their secrets because I can bring my own tasks.

I think this is a general problem that exists with all our workspace start URLs. The underlying issue is that supervisor runs processes and it is not obvious what those processes will be from looking at a link. I suggest we introduce something where we ask users if they trust certain sources to allow task execution.

Indeed it does. What the additionalContent context URL does though is to explicitly circumvent the "security model" of environment variables/secrets (namely repository patterns). The assumption underpinning environment variables is that you trust the repository you specify using the repository pattern. There's a chance you trust this repo because you trust everyone who can modify its content, hence its tasks.
If we now allow arbitrary tasks to be executed in arbitrary repositories for arbitrary users (all you need is to click a link), that undermines this - already somewhat weak - idea of "trust in a repo".

Asking users if they want to execute the task might be one way - and is indeed a trade-off between friction and security.
Alternatively we could re-consider the Workspace Secrets RFC which would alleviate this problem also.

components/content-service-api/initializer.proto Outdated Show resolved Hide resolved
components/content-service-api/initializer.proto Outdated Show resolved Hide resolved
message BuildSourceDockerfile {
contentservice.WorkspaceInitializer source = 1;
string dockerfile_version = 2;
string dockerfile_path = 3;
string context_path = 4;
string dockerfile_content_sha = 5;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could re-use the dockerfile_version - I don't think we need to change the protocol here.
The only difference is that before the version was the commit hash, now it's the SHA hash.

If we moved the "legacy" behaviour to server, that is.

@@ -224,6 +224,10 @@ func (b *DockerBuilder) getBaseImageRef(ctx context.Context, bs *api.BuildSource
return b.getAbsoluteImageRef(ctx, src.Ref.Ref, allowedAuth)

case *api.BuildSource_File:
if src.File.DockerfileContentSha != "" {
return fmt.Sprintf("%s:%x", b.Config.BaseImageRepository, src.File.DockerfileContentSha), nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The manifest approach below takes more into account than just the Dockerfile version. Using just the content hash ignores the context entirely - the same Dockerfile from two separate repositories with different context would get the same "build".

This could even leak sensitive details because one user might get the build of another (same Dockerfile content, but different context including repo).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: because GitHub doesn't let me put a comment in line 238 ... you'd have to expand the if to include the CompositeInitializer etc.

const hostContext = this.hostContextProvider.get(context.repository.host);
if (!hostContext || !hostContext.services) {
throw new Error(`Cannot fetch workspace image source for host: ${context.repository.host}`);
}
const lastDockerFileSha = await hostContext.services.fileProvider.getLastChangeRevision(context.repository, context.revision, user, imgcfg.file);
const lastDockerFileShaPromise = hostContext.services.fileProvider.getLastChangeRevision(context.repository, context.revision, user, imgcfg.file);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we resolved the base image here already we could check which case (lastDockerFileSha or this.getContentSHA(dockerFileContent)) resolves to a built image. This behaviour is what I referred to as "move the legacy behaviour to server" above.

source = initializer;
disp.push(disposable);
} else {
//TODO we cannot change this initializer structure now because it is part of how baserefs are computed in image-builder.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
//TODO we cannot change this initializer structure now because it is part of how baserefs are computed in image-builder.
// TODO(se): we cannot change this initializer structure now because it is part of how baserefs are computed in image-builder.

see https://www.notion.so/gitpod/How-we-develop-e8ce7e562478458a9223eb9b797415b2#86e669f8e7df4f1289c9bede0855b5f9

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd propose we update the behaviour in image builder to accompany the new initializer. Also, the current change on how the baseRef is computed ignores the initializer entirely.

let contextPath = checkoutLocation;
let dockerFilePath = checkoutLocation + '/' + imgsrc.dockerFilePath;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be at odds with the comment above. Either the comment or the code is wrong :)

@svenefftinge
Copy link
Member Author

svenefftinge commented Jun 11, 2021

/werft run

👍 started the job as gitpod-build-se-external-config.26

@svenefftinge
Copy link
Member Author

svenefftinge commented Jun 12, 2021

/werft run

👍 started the job as gitpod-build-se-external-config.28

Introduces an AdditionalContentContext and the
corresponding Initializer.
@svenefftinge
Copy link
Member Author

svenefftinge commented Jun 12, 2021

Thanks, @csweichel! I have updated the PR as discussed, could you have a look again?

@svenefftinge
Copy link
Member Author

svenefftinge commented Jun 14, 2021

/werft run

👍 started the job as gitpod-build-se-external-config.36

@svenefftinge
Copy link
Member Author

svenefftinge commented Jun 15, 2021

/werft run

👍 started the job as gitpod-build-se-external-config.37

@csweichel csweichel force-pushed the se/external-config branch 2 times, most recently from a6cdde6 to 3fa9438 Compare June 15, 2021 10:36
@csweichel
Copy link
Contributor

I added unit tests and implemented the digest check in the content initializer. Unfortunately the digest sent by the server does not match what we actually download:

image

}
manifest[prefix+"Files"] = fmt.Sprintf("%x", hash.Sum(nil))
Copy link
Member Author

@svenefftinge svenefftinge Jun 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this trigger docker builds on every change in .gitpod.yml (or other files)? I believe this would be too aggressive.

@csweichel csweichel force-pushed the se/external-config branch from 3fa9438 to a7c2638 Compare June 15, 2021 11:27
to the new FileDownloadInitializer
@csweichel csweichel force-pushed the se/external-config branch from a7c2638 to 5436968 Compare June 15, 2021 11:36
Copy link
Contributor

@csweichel csweichel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

string digest = 3;
}
repeated FileInfo files = 1;
string target_location = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

beware that all initializer are executed relative to a target location themselves, i.e. all paths in an initializer are always relative to that location.

The comment in line 30 implies that target_location can be absolute, when in reality it cannot.

for _, info := range ws.FilesInfos {
contents, err := downloadFile(ctx, info.url)
if err != nil {
tracing.LogError(span, xerrors.Errorf("cannot download file '%s' from '%s': %w", info.filePath, info.url, err))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it intentional that when downloading a file fails the content init does not fail?

return src, nil
}

func downloadFile(ctx context.Context, url string) (content []byte, err error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

downloadFile should write the file to disk immediately and not keep it in RAM first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, we need to respect the digest

func (e *CompositeInitializer) Run(ctx context.Context, mappings []archive.IDMapping) (csapi.WorkspaceInitSource, error) {
_, ctx = opentracing.StartSpanFromContext(ctx, "CompositeInitializer.Run")
for _, init := range e.Initializer {
init.Run(ctx, mappings)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if one initializer fails the others should fail, too

@svenefftinge svenefftinge merged commit 8cc1e52 into main Jun 15, 2021
@svenefftinge svenefftinge deleted the se/external-config branch June 15, 2021 12:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants