Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Hardware Accelerators #192

Closed
21 tasks
vishh opened this issue Feb 28, 2017 · 21 comments
Closed
21 tasks

Support for Hardware Accelerators #192

vishh opened this issue Feb 28, 2017 · 21 comments
Assignees
Labels
do-not-merge/docs lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/node Categorizes an issue or PR as relevant to SIG Node. stage/alpha Denotes an issue tracking an enhancement targeted for Alpha status
Milestone

Comments

@vishh
Copy link
Contributor

vishh commented Feb 28, 2017

Description

Kubernetes is becoming popular for managing workloads that consume accelerators like Tensorflow for example. The agility that Kubernetes offers makes it easy to consume accelerators across a fleet of machines.
Kubernetes can provide an end to end workflow by separating provisioning and configuration of accelerators from consumption.

Progress Tracker

  • Alpha
    • Write and maintain draft quality doc
      • During development keep a doc up-to-date about the desired experience of the feature and how someone can try the feature in its current state. Think of it as the README of your new feature and a skeleton for the docs to be written before the Kubernetes release. Paste link to Google Doc: DOC-LINK
    • Design Approval
      • Design Proposal. This goes under design-proposals. Doing a proposal as a PR allows line-by-line commenting from community, and creates the basis for later design documentation. Paste link to merged design proposal here: PROPOSAL-NUMBER
      • Decide which repo this feature's code will be checked into. Not everything needs to land in the core kubernetes repo. REPO-NAME
      • Identify shepherd (your SIG lead and/or kubernetes-pm@googlegroups.com will be able to help you). My Shepherd is: replace.me@replaceme.com (and/or GH Handle)
        • A shepherd is an individual who will help acquaint you with the process of getting your feature into the repo, identify reviewers and provide feedback on the feature. They are not (necessarily) the code reviewer of the feature, or tech lead for the area.
        • The shepherd is not responsible for showing up to Kubernetes-PM meetings and/or communicating if the feature is on-track to make the release goals. That is still your responsibility.
      • Identify secondary/backup contact point. My Secondary Contact Point is: replace.me@replaceme.com (and/or GH Handle)
    • Write (code + tests + docs) then get them merged. ALL-PR-NUMBERS
      • Code needs to be disabled by default. Verified by code OWNERS
      • Minimal testing
      • Minimal docs
        • cc @kubernetes/docs on docs PR
        • cc @kubernetes/feature-reviewers on this issue to get approval before checking this off
        • New apis: Glossary Section Item in the docs repo: kubernetes/kubernetes.github.io
      • Update release notes
  • Beta
    • Testing is sufficient for beta
    • User docs with tutorials
      - Updated walkthrough / tutorial in the docs repo: kubernetes/kubernetes.github.io
      - cc @kubernetes/docs on docs PR
      - cc @kubernetes/feature-reviewers on this issue to get approval before checking this off
    • Thorough API review
      • cc @kubernetes/api
  • Stable
    • docs/proposals/foo.md moved to docs/design/foo.md
      - cc @kubernetes/feature-reviewers on this issue to get approval before checking this off
    • Soak, load testing
    • detailed user docs and examples
      • cc @kubernetes/docs
      • cc @kubernetes/feature-reviewers on this issue to get approval before checking this off

FEATURE_STATUS is used for feature tracking and to be updated by @kubernetes/feature-reviewers.
FEATURE_STATUS: IN_DEVELOPMENT

cc @kubernetes/sig-node-feature-requests @kubernetes/sig-scheduling-feature-requests

@vishh
Copy link
Contributor Author

vishh commented Feb 28, 2017

cc @aronchick for priority

@jeremyeder
Copy link

jeremyeder commented Mar 1, 2017

s/accelerators/device assignment please? /cc @derekwaynecarr

@k82cn
Copy link
Member

k82cn commented Mar 1, 2017

regarding accelerators, does it mean some kind of device, e.g. GPU (but not limit to GPU)?

@cmluciano
Copy link

/subscribe

@jeremyeder
Copy link

@k82cn yes. Actually per sig meeting yesterday, any PCI device (most tend to be accelerators but I'd personally prefer more generic wording). Note that Intel has "accelerators" inside their CPUs (called CPU extensions). All of these things should become candidates for scheduler match making.

@cmluciano
Copy link

related kubernetes/community#414

@vishh
Copy link
Contributor Author

vishh commented Mar 1, 2017

@jeremyeder

My understanding is that,

  1. There needs to be a way to discover, represent and consume Accelerators as a resource in Kubernetes
  2. As an optimization, node hardware topology needs to taken into account while provisioning accelerators.
  • 1 does not depend on 2 and 2 can be solved independent of 1.
  • This feature is meant to focus on 1
  • It can benefit from 2 if it made available in parallel.

@ravisantoshgudimetla
Copy link
Contributor

Is the scope limited to accelerators or some co-processors like TPM etc?

My understanding is that,

  1. There needs to be a way to discover, represent and consume Accelerators as a resource in Kubernetes

If the hardware discovery is a functionality that we are targeting, shouldn't scope be broadened to all types of devices(including accelerators)?

@vishh
Copy link
Contributor Author

vishh commented Mar 1, 2017 via email

@philips
Copy link
Contributor

philips commented Mar 2, 2017

Can we use the term "hardware accelerators"? I was really confused by this issue at first.

@vishh vishh changed the title Support for Accelerators Support for Hardware Accelerators Mar 2, 2017
@idvoretskyi idvoretskyi added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Mar 2, 2017
@idvoretskyi idvoretskyi added this to the next-milestone milestone Mar 2, 2017
@liyubobj
Copy link

liyubobj commented Mar 3, 2017

Good proposal! I think topology support for deivce is a must. For example, nvidia GPUs on different PCI bridge can not talk p2p.

@idvoretskyi
Copy link
Member

ping @calebamiles to review

@calebamiles calebamiles modified the milestones: 1.8, next-milestone Jul 31, 2017
@calebamiles calebamiles added the stage/alpha Denotes an issue tracking an enhancement targeted for Alpha status label Aug 3, 2017
@vishh
Copy link
Contributor Author

vishh commented Sep 12, 2017

One of the critical pieces of this problem is Hardware device plugins landed in v1.8 #368.
This feature is broad and requires more work around identifying and defining the matrix of devices, device plugins and workload compatibility. This aspect is expected to be handled outside of core kubernetes, but the specifics are not yet defined. For that reason, I'm leaving this issue open, and moving it to v1.9.

@idvoretskyi
Copy link
Member

idvoretskyi commented Nov 13, 2017

@vishh is it still alpha for 1.9?

Also, can you update the feature template to follow the new format? https://github.com/kubernetes/features/blob/master/ISSUE_TEMPLATE.md

@rohitagarwal003
Copy link
Member

It is still alpha for 1.9.

@zacharysarah
Copy link
Contributor

@vishh 👋 Please indicate in the 1.9 feature tracking board
whether this feature needs documentation. If yes, please open a PR and add a link to the tracking spreadsheet. Thanks in advance!

@zacharysarah
Copy link
Contributor

@vishh Bump for docs ☝️

/cc @idvoretskyi

k8s-github-robot pushed a commit to kubernetes/kubernetes that referenced this issue Dec 20, 2017
Automatic merge from submit-queue (batch tested with PRs 56681, 57384). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Deprecate the alpha Accelerators feature gate.

Encourage people to use DevicePlugins instead.

/kind cleanup

Related to kubernetes/enhancements#192 and kubernetes/enhancements#368

**Release note**:
```release-note
The alpha Accelerators feature gate is deprecated and will be removed in v1.11. Please use device plugins instead. They can be enabled using the DevicePlugins feature gate.
```

/sig node
/sig scheduling
/area hw-accelerators
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 27, 2018
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 29, 2018
@justaugustus
Copy link
Member

@vishh
Any plans for this in 1.11?

If so, can you please ensure the feature is up-to-date with the appropriate:

  • Description
  • Milestone
  • Assignee(s)
  • Labels:
    • stage/{alpha,beta,stable}
    • sig/*
    • kind/feature

cc @idvoretskyi

@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

justaugustus pushed a commit to justaugustus/enhancements that referenced this issue Sep 3, 2018
ingvagabund pushed a commit to ingvagabund/enhancements that referenced this issue Apr 2, 2020
Update Kuryr information on SSC doc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/docs lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/node Categorizes an issue or PR as relevant to SIG Node. stage/alpha Denotes an issue tracking an enhancement targeted for Alpha status
Projects
None yet
Development

No branches or pull requests