Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix handling of watch notice errors from k8s #393

Merged
merged 1 commit into from
Aug 26, 2020

Conversation

agrare
Copy link
Member

@agrare agrare commented Aug 25, 2020

When a watch fails the status and code are in the notice.object not the top level resource.

Thanks to @jrafanie for testing this:

irb(main):047:0> watches = co.send(:kube_connection).watch_pods(:namespace => "gt-lj-manageiq-slow-fyre", :resource_version => 199900).to_enum
=> #<Enumerator: #<Kubeclient::Common::WatchStream:0x0000560cd2222718 @uri=#<URI::HTTPS https://172.30.0.1/api/v1/watch/namespaces/gt-lj-manageiq-slow-fyre/pods?resourceVersion=199900>, @http_client=nil, @http_options={:basic_auth_user=>nil, :basic_auth_password=>nil, :headers=>{:Authorization=>"Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6IiJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJndC1sai1tYW5hZ2VpcS1zbG93LWZ5cmUiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlY3JldC5uYW1lIjoibWFuYWdlaXEtb3JjaGVzdHJhdG9yLXRva2VuLXhuNndoIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6Im1hbmFnZWlxLW9yY2hlc3RyYXRvciIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6IjI3ZjViMGE5LWNjNDQtMTFlYS05ODcwLWQwOTQ2NjBkMzFmYiIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDpndC1sai1tYW5hZ2VpcS1zbG93LWZ5cmU6bWFuYWdlaXEtb3JjaGVzdHJhdG9yIn0.Xoli-YGesjFfstWKQXyYSbbU3gOpto1u2ScWgIb-1WjLf-kAz9UWS5NkGJNlXeEJSuDuVBcw05HJbY1sy5Qp0d1eAfKQEtLJp_zrX8m8ijzRU5DKehSW-oV6J5J7RozYV3G_1-RNTnre30mSJRneLySo-20iFYc4SXjbts18X02lgzO_1QyPFRXbnOgdtLewDpmoRwmKQTawRHDeCep1AHiRQljjQdKQLOa0wiknRo5jcS5I-xUdC_PAvVUXM5D4LUR4NlApfNw7fq3UmXO18NM1NMdorW_kfAa63K_ADzE8pd32nf1WemqTxn6RHvcFf0D6ZJb8NlGFC26KHPYLCA"}, :http_proxy_uri=>nil, :http_max_redirects=>10, :ssl=>{:ca_file=>"/run/secrets/kubernetes.io/serviceaccount/ca.crt", :cert=>nil, :cert_store=>nil, :key=>nil, :verify_mode=>1}, :socket_class=>nil, :ssl_socket_class=>nil}, @formatter=#<Proc:0x0000560cd2222920@/opt/manageiq/manageiq-gemset/gems/kubeclient-4.8.0/lib/kubeclient/common.rb:318 (lambda)>>:each>
irb(main):048:0> watches.first
=> #<Kubeclient::Resource type="ERROR", object={:kind=>"Status", :apiVersion=>"v1", :metadata=>{}, :status=>"Failure", :message=>"too old resource version: 199900 (27177196)", :reason=>"Gone", :code=>410}>
irb(main):049:0> watches.first.kind
=> nil
irb(main):050:0> watches.first.type
=> "ERROR"
irb(main):051:0> watches.first.object.kind
=> "Status"
irb(main):052:0> watches.first.object.code
=> 410
irb(main):053:0> watches.first.code
=> nil

When a watch fails the status and code are in the notice.object not the
top level resource.
@miq-bot
Copy link
Member

miq-bot commented Aug 25, 2020

Checked commit agrare@0b9c82a with ruby 2.6.3, rubocop 0.69.0, haml-lint 0.28.0, and yamllint
1 file checked, 0 offenses detected
Everything looks fine. 🏆

Copy link
Member

@jrafanie jrafanie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cben
Copy link
Contributor

cben commented Aug 26, 2020

Correct, consistent with what I saw in ManageIQ/kubeclient#452.
Is it right to break for all errors, not just 410?

@cben
Copy link
Contributor

cben commented Aug 26, 2020

Yeah I think that makes sense on any unexpected error.

What other kind of error are possible?
Do errors always contain .code, .reason, .message?

Went diving in k8s type=ERROR code...
Uncached:
https://github.com/kubernetes/kubernetes/blob/f6eeab819c3f9ebbba973eb1ccffc69401d8930e/staging/src/k8s.io/apiserver/pkg/storage/etcd3/watcher.go#L351-L358
Cached: https://github.com/kubernetes/kubernetes/blob/1fdd8fb213e0361e8f18b1dd152dddb4c88ad183/staging/src/k8s.io/apiserver/pkg/storage/cacher/cacher.go#L1110-L1124
Most paths emit object that is metav1.Status with the familiar fields (example https://github.com/kubernetes/kubernetes/blob/788a073c79fea816282b87ab2946181821df4531/staging/src/k8s.io/apimachinery/pkg/api/errors/errors.go#L218-L238).
Cached has one mysterious case runtime.Object: path that I dont know what it can contain.
Anyway, recursive-open-struct returns nil for missing fields. ✔️

Not sure what other errors are possible other than 500.

LGTM 💯

@cben cben added the inventory label Aug 26, 2020
@cben cben merged commit 6ef17c3 into ManageIQ:master Aug 26, 2020
@agrare agrare deleted the fix_410_gone_restart_handling branch August 26, 2020 12:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants