-
Notifications
You must be signed in to change notification settings - Fork 208
fpga: make admission webhook mode-less #358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
a904596
to
ec784d2
Compare
@kad ping. Could you please review the updates to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nits, otherwise LGTM
@@ -20,3 +20,9 @@ spec: | |||
afuId: | |||
type: string | |||
pattern: '^[0-9a-f]{8,128}$' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this might be overkill. we don't expect such long UUIDs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Limited to 32.
cmd/fpga_admissionwebhook/README.md
Outdated
is the first 31 characters of the region interface ID for Arria10 with DCP1.1 | ||
firmware. The next 32 characters (`d8424dc4a4a3c413f89e433683f9040b`) is an accelerator function ID. | ||
|
||
The same mapping, but with its mode field set to `orchestrated`, translates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "orchestrated" might be not the best choice of the terminology here for cluster admin UX, but I don't have right now better word. Also, it might be good idea to accept "pre-programmed" as alias to "programmed ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mm.. dunno. "programmed" can be read as "just in time programmed" aka orchestrated.
Maybe "static" and "dynamic"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it might be good idea to accept "pre-programmed" as alias to "programmed"
(This meant to say accept both pre-programmed and preprogrammed?)
Maybe "static" and "dynamic"
this would work, IMO, but it might be good to start using "programming mode" in documentation where we just say only "mode" today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've checked the docs, there's no place where just "mode" is mentioned. But in fpga_plugin's README.md af
and preprogrammed
are used interchangeably. The same with region
and orchestrated
. And "orchestration programmed" is an alias for orchestrated
.
Probably we could rename pregrogrammed
and af
to static
; orchestrated
and region
- to dynamic
; regiondevel
- to just devel
. Also I'd avoid accepting aliases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we keep using 'region' and 'af' modes in order not to break compatibility with the currently used plugin modes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's correct now. Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rojkov As for mode name I think we can continue using 'af' and 'region' as we got rid of webhook mode names 'preprogrammed' and 'orchestrated'. Although webhook names look better in my opinion, using plugin names would help us to avoid possible user confusions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I'll stick to the plugin mode names everywhere. Will update the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea, +1
@@ -138,13 +120,31 @@ $ ./scripts/webhook-deploy.sh --ca-bundle-path /var/run/kubernetes/server-ca.crt | |||
|
|||
# Mappings | |||
|
|||
Requested FPGA resources are translated to AF resources. For example, | |||
`fpga.intel.com/arria10.dcp1.1-nlb0` is translated to `fpga.intel.com/af-d8424dc4a4a3c413f89e433683f9040b`. | |||
For the following mapping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be good to say a sentence or two, kind of "introduction", that mappings are essential part of the setup and give cluster administrator flexible instrument for managing FPGA bitstreams and access control for them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! added
e387f4d
to
6b29065
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs rebase
cmd/fpga_admissionwebhook/README.md
Outdated
is the first 31 characters of the region interface ID for Arria10 with DCP1.1 | ||
firmware. The next 32 characters (`d8424dc4a4a3c413f89e433683f9040b`) is an accelerator function ID. | ||
|
||
The same mapping, but with its mode field set to `orchestrated`, translates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it might be good idea to accept "pre-programmed" as alias to "programmed"
(This meant to say accept both pre-programmed and preprogrammed?)
Maybe "static" and "dynamic"
this would work, IMO, but it might be good to start using "programming mode" in documentation where we just say only "mode" today.
6b29065
to
10054b0
Compare
Thanks! Rebased. |
325dea1
to
dbf5a35
Compare
Updated to use consistent mode names across the FPGA webhook and plugin. |
I'm going to give other reviewers one day for their comments/approvals. Then I'll merge this. |
this looks good to me but should we squash commits? |
I agree with @mythi, let's squash and lgtm. |
dbf5a35
to
6c2eacf
Compare
fpga: make AFU resource name 63 char long webhook: drop mode from README webhook: extend mappings description webhook: tighten CRD definitions webhook: drop mapping to non-existing afuId explicitly state mappings names can be in any format use consistent terminology across fpga webhook and plugin
@rojkov Plugin is failing in af mode due to incorrect socket file name:
However, it seems to work in region mode. |
Thank you! I'll take a look.
…On Thu, Apr 23, 2020 at 12:21 PM Ed Bartosh ***@***.***> wrote:
@rojkov <https://github.com/rojkov> Plugin is failing in af mode due to
incorrect socket file name:
> kubectl get pods
NAME READY STATUS RESTARTS AGE
intel-fpga-plugin-rk69j 0/1 CrashLoopBackOff 4 2m23s
> kubectl logs intel-fpga-plugin-rk69j
E0423 09:13:28.819373 1 manager.go:125] Failed to serve fpga.intel.com/69528db6eb31577a8c3668f9faa081ff7df405cbd7acf7222f144b0b93acd18: listen unix /var/lib/kubelet/device-plugins/fpga.intel.com-69528db6eb31577a8c3668f9faa081ff7df405cbd7acf7222f144b0b93acd18.sock: bind: invalid argument
Failed to listen to plugin socketgithub.com/intel/intel-device-plugins-for-kubernetes/pkg/deviceplugin.(*server).setupAndServe
/intel-device-plugins-for-kubernetes/pkg/deviceplugin/server.go:211github.com/intel/intel-device-plugins-for-kubernetes/pkg/deviceplugin.(*server).Serve
/intel-device-plugins-for-kubernetes/pkg/deviceplugin/server.go:163github.com/intel/intel-device-plugins-for-kubernetes/pkg/deviceplugin.(*Manager).handleUpdate.func1
/intel-device-plugins-for-kubernetes/pkg/deviceplugin/manager.go:123
runtime.goexit
/usr/lib/golang/src/runtime/asm_amd64.s:1357
However, it seems to work in region mode.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#358 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAIEYOSQTNR47QPTAHPTVTROACANANCNFSM4MCGZACQ>
.
|
Too bad unix socket addresses can be 108 chars long in Linux. Should I strip the interface ID part even further from 31 to 16? Are we ok with that? |
@rojkov Do you think 16 chars would make unique enough id? Any other ways to solve this? |
The limit is not 115, but 108 (including a trailing zero I presume), so the practical limit, as I tested, is 107. I've just updated the PR. Could you please check it once more? |
pkg/fpga/devtypes.go
Outdated
return "", errors.Wrapf(err, "failed to decode %q and %q", interfaceID, afuID) | ||
} | ||
|
||
return fmt.Sprintf("%s.%s.%s", interfaceID[:3], afuID[:3], base64.RawURLEncoding.EncodeToString(bin)), nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have 4 characters long prefixes and '-' delimiters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unlike dots '-' can be in the encoded section, so dots do a better job at separating the sections visually imho.
But ok, will update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. Feel free to use dots then. Just increase prefixes to 4 chars.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Done.
We can still use 4 chars long prefixes, right?
|
76c9c0e
to
cf2acb8
Compare
@rojkov Could it be that mappings are setup incorrectly? When I run plugin in af mode the node has these af resources:
However, the pod resource requests are mutated to region- resources:
Side note - we may want to add 'af-' prefix to af resource names to make the resource naming consistent. We can do that by reducing af id and interface id prefixes to 3 chars each. |
The I can add one more mapping Regarding the |
Yes, please do. I'd also suggest renaming arria10.dcp.1.2-nlb0 to arria10.dcp.1.2-nlb0-orchestrated to be more explicit. |
cf2acb8
to
bafc207
Compare
Ok, added. Also I've added |
@rojkov It's still failing. Looks like 4 symbols for afu and interface ids is too much:
|
@rojkov I'll try to reduce length of ids to 3. Hopefully this should be enough. |
@rojkov worked for me with this change:
Please, update the PR. P.S. I'll send e2e test update as a PR to your branch rojkov:webhook-modeless |
done: rojkov#5 |
2bc66d6
to
c15734f
Compare
Oops, my bad. Reduced to 3 symbols now and cherry-picked your commit. Could you please try again? |
type AcceleratorFunctionSpec struct { | ||
AfuID string `json:"afuId"` | ||
AfuID string `json:"afuId"` | ||
InterfaceID string `json:"interfaceId"` | ||
Mode string `json:"mode"` | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about the need for v2 API version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose in this case we'd need to maintain two versions of the webhook: "mode-ful" and "mode-less". Making the webhook modeless is already a breaking change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making the webhook modeless is already a breaking change.
I was thinking v2 because of the breakage but did not think of the need to have both versions maintained (beyond what we have in the release branches).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. That makes sense if there are installed old mappings in a cluster. Will update the version on Monday.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the fpga.intel.com API group to v2. It increased the size of the patch, but all tests pass.
cmd/fpga_admissionwebhook/README.md
Outdated
The same mapping, but with its mode field set to `region`, translates | ||
`fpga.intel.com/arria10.dcp1.1-nlb0` to `fpga.intel.com/region-9926ab6d6c925a68aabca7d84c545738`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it makes the reader confused when we use the same resource name as the example for both modes. Perhaps add -orchestrated
suffix here as it's in the default CRDs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! I forgot to reflect the compressed resource name in README.
Fixed now and used a real mapping from mappings-collection.yaml
for the example.
c15734f
to
8d6afae
Compare
Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
closes #301