Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cluster up: use docker engine-api client #14729

Merged
merged 1 commit into from
Jun 22, 2017

Conversation

csrwng
Copy link
Contributor

@csrwng csrwng commented Jun 18, 2017

Switches cluster up from the fsouza go docker client to the docker engine-api client
Fixes #14546

@csrwng
Copy link
Contributor Author

csrwng commented Jun 18, 2017

@openshift/devex ptal

Just a note on the client code ... a good bit of similar code exists in the kube code base and I copied a couple of functions from it. However, I decided against calling the Kube client directly because:

  1. It doesn't include everything I need like the file upload/download functions
  2. It has already moved to a different spot in master than it was in the 1.6 vendored code that we have in the code base
  3. It's uncertain if it belongs in the kube codebase going forward, given that the main interface is CRI and implementations can be external binaries.

@csrwng
Copy link
Contributor Author

csrwng commented Jun 18, 2017

[testextended][extended:clusterup]

err error
}

type apiResult interface{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused?

@jim-minter
Copy link
Contributor

jim-minter commented Jun 19, 2017

I wish we could get away with not having yet another docker API wrapper, but assuming this wish can't come true, honestly I would prefer it to closely match the naming/semantics of the underlying client - for example:

  • name the wrapper APIs identically to the underlying APIs wherever possible
  • remove smarts such as the for ... range in ContainerNames() (see local comment)

My impression is that wherever we add restrictions on the underlying API (e.g. provide 2 underlying API calls in one wrapper function, or hard code arguments to the underlying API), we create additional future work for ourselves when we undo that.

Also I don't see the benefit of adding additional layers of scaffolding (e.g. dockerClient, containerClient, imageClient, execClient...).

Also it feels like we just have a huge amount of "helper" infrastructure in pkg/bootstrap. I think it adds quite a lot of cognitive load, e.g. when diagnosing concurrency issues in the underlying API, which I've had to do twice in the last 6 months.

return c.endpoint
}

func (c *dockerClient) Version() (*types.Version, error) {
Copy link
Contributor

@jim-minter jim-minter Jun 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would prefer ServerVersion, etc., etc. (audit all function names)

return &info, err
}

func (c *dockerClient) ContainerNames() ([]string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would prefer this to be ContainerList - move the additional smarts to the caller - there's only one.

}

type containerClient struct {
name string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest id (since there's an ID() function)

}

type execClient struct {
name string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest id (since there's an ID() function)

stdinDone := make(chan struct{})
go func() {
if inputStream != nil {
io.Copy(resp.Conn, inputStream)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no error reporting if inputStream copy fails

return c.name
}

func (c *execClient) StartAndWait(stdIn io.Reader, stdOut, stdErr io.Writer) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

misleading name


const (
// defaultTimeout is the default timeout of short running docker operations.
defaultTimeout = 2 * time.Minute
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defaultDockerOpTimeout? :-)


const (
// defaultTimeout is the default timeout of short running docker operations.
defaultTimeout = 2 * time.Minute
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we've found this to be problematic elsewhere (s2i) and ultimately removed it. at a minimum i'd consider bumping it to 10 mins if we don't think we can live without it entirely. (problematic when the docker daemon is overloaded)

}
}
return names, nil
return h.client.ContainerNames()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unless you do what @jim-minter suggested in terms of moving the conversion to array logic here, i don't think there's even a reason for this function to exist anymore.

os::cmd::try_until_text "oc get pods -n logging -l logging-infra=deployer" "Completed" $(( 20*minute )) 2
os::cmd::try_until_text "oc get endpoints logging-kibana -o jsonpath='{ .subsets[*].ports[?(@.name==\"3000-tcp\")].port }' -n logging" "3000" $(( 10*minute )) 1
os::cmd::expect_success "oc login -u developer"
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can i haz service-catalog test plz?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I tried running on my Mac with --service-catalog and I get this:

oc cluster up --service-catalog
Starting OpenShift using openshift/origin:v3.6.0-alpha.2 ...
-- Checking OpenShift client ... OK
-- Checking Docker client ... OK
-- Checking Docker version ... OK
-- Checking for existing OpenShift container ... OK
-- Checking for openshift/origin:v3.6.0-alpha.2 image ... OK
-- Checking Docker daemon configuration ... OK
-- Checking for available ports ... OK
-- Checking type of volume mount ... 
   Using Docker shared volumes for OpenShift volumes
-- Creating host directories ... OK
-- Finding server IP ... 
   Using 127.0.0.1 as the server IP
-- Starting OpenShift container ... 
   Creating initial OpenShift configuration
   Starting OpenShift using container 'origin'
   Waiting for API server to start listening
   OpenShift server started
-- Adding default OAuthClient redirect URIs ... OK
-- Installing registry ... OK
-- Installing router ... OK
-- Importing image streams ... OK
-- Importing templates ... OK
-- Installing service catalog ... FAIL
   Error: cannot instantiate service catalog template
   Caused By:
     Error: cannot create objects from template openshift/service-catalog
     Caused By:
       Error: [policybindings.authorization.openshift.io "service-catalog:default" not found, policybindings.authorization.openshift.io "kube-system:default" not found]

The outcome is the same whether using this branch or master. I'll create an issue for it and when it's fixed, it should be very easy to add to the extended test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it needs to use a new origin image when running. the requirement to have a policybinding before creating a binding was recently lifted.

I would have expected this test to run using the local origin image anyway, though (--version=latest)

@bparees bparees self-assigned this Jun 19, 2017
@csrwng
Copy link
Contributor Author

csrwng commented Jun 19, 2017

Thanks for the reviews, comments addressed

@bparees
Copy link
Contributor

bparees commented Jun 20, 2017

lgtm but @jim-minter has stronger opinions so will let him sign off before merging.

@openshift-bot openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 21, 2017
// holdHijackedConnection holds the HijackedResponse, redirects the inputStream to the connection, and redirects the response
// stream to stdout and stderr.
func holdHijackedConnection(inputStream io.Reader, outputStream, errorStream io.Writer, resp types.HijackedResponse) error {
receiveStdout := make(chan error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

must be buffered (make(chan error, 1)) otherwise the goroutine will never be able to exit if the channel reader has gone away

}()
}

sendStdin := make(chan error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also must be buffered

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(actually this is the key one)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not make the channel buffered if I don't have to. Looking at this function, I don't see how we wouldn't read the channel after starting the goroutine, and I want to wait for a result to be available.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buffered != non-blocking read. As things stand, if the select reads a value from receiveStdout before reading a value from sendStdin, the function will return. However in this case the inputStream goroutine will block forever trying to write a value to sendStdin, because it is unbuffered and no-one will read from it once the function has returned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ic, got it. Thx for the explanation

return c.client.CopyToContainer(context.Background(), container, dest, src, options)
}

func (c *dockerClient) CopyFromContainer(container string, src string, dest io.Writer) error {
Copy link
Contributor

@jim-minter jim-minter Jun 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest you return all the values from c.client.CopyFromContainer straight. Then you get rid of the io.Copy here, and the goroutine and pipe in newContainerDownloader.

}

type dockerClient struct {
endpoint string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently appears to be unused - are you intending to use it in the future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the link didn't work for me - I can see it's used in Endpoint(), but I can't see Endpoint() being used anywhere...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line 202 of dockerhelper/helper.go

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks.


// The OSE version will have > 4 parts to the version string
// We'll only take the first 3
parts := strings.Split(versionStr, ".")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parts := strings.SplitN(versionStr, ".", 3)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not what I want. SplitN will give me the last segment including the dot. Here I want to remove 4th segment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry :(

@jim-minter
Copy link
Contributor

minor comments, otherwise go for it.

@csrwng
Copy link
Contributor Author

csrwng commented Jun 21, 2017

@jim-minter comments addressed

@jim-minter
Copy link
Contributor

@csrwng looks good to me, and also, I don't think I was sufficiently careful with the review and probably caused you unnecessary work with it. Apologies, I will take more care next time.

@csrwng
Copy link
Contributor Author

csrwng commented Jun 21, 2017

thx for the review, no worries
[merge]

@openshift-bot
Copy link
Contributor

Evaluated for origin merge up to 488c111

@openshift-bot
Copy link
Contributor

[Test]ing while waiting on the merge queue

@openshift-bot
Copy link
Contributor

Evaluated for origin test up to 488c111

@bparees
Copy link
Contributor

bparees commented Jun 21, 2017

[merge][severity:blocker] (this fixes a p1)

@openshift-bot
Copy link
Contributor

Evaluated for origin testextended up to 488c111

@openshift-bot openshift-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 21, 2017
@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/testextended SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin_extended/678/) (Base Commit: 8c00852) (PR Branch Commit: 488c111) (Extended Tests: clusterup)

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/test FAILURE (https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin/2475/) (Base Commit: 8c00852) (PR Branch Commit: 488c111)

@@ -675,16 +671,17 @@ func (c *CommonStartConfig) CheckNsenterMounter(out io.Writer) error {
// CheckDockerVersion checks that the appropriate Docker version is installed based on whether we are using the nsenter mounter
// or shared volumes for OpenShift
func (c *CommonStartConfig) CheckDockerVersion(out io.Writer) error {
ver, _, err := c.DockerHelper().Version()
ver, isRHDocker, err := c.DockerHelper().APIVersion()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have no idea how much this disappoints me

@openshift-bot
Copy link
Contributor

openshift-bot commented Jun 22, 2017

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/merge_pull_request_origin/1085/) (Base Commit: 7657e99) (PR Branch Commit: 488c111) (Extended Tests: blocker) (Image: devenv-rhel7_6392)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants