Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

akash-provider doesn't pass the GPU interface to the bid price script #216

Closed
andy108369 opened this issue Apr 23, 2024 · 2 comments · Fixed by akash-network/provider#234
Labels
awaiting-triage repo/provider Akash provider-services repo issues

Comments

@andy108369
Copy link
Contributor

andy108369 commented Apr 23, 2024

provider 0.5.13

  • provider node labels (interface & ram are present):
$ kubectl get nodes --show-labels
NAME                    STATUS   ROLES           AGE    VERSION   LABELS
control-01.hurricane2   Ready    control-plane   213d   v1.28.6   akash.network/capabilities.storage.class.beta3=1,akash.network=true,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=control-01.hurricane2,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node.kubernetes.io/exclude-from-external-load-balancers=
worker-01.hurricane2    Ready    <none>          213d   v1.28.6   akash.network/capabilities.gpu.vendor.nvidia.model.t4.interface.pcie=1,akash.network/capabilities.gpu.vendor.nvidia.model.t4.ram.16Gi=1,akash.network/capabilities.gpu.vendor.nvidia.model.t4=1,akash.network/capabilities.storage.class.beta3=1,akash.network=true,allow-nvdp=true,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=worker-01.hurricane2,kubernetes.io/os=linux,nvidia.com/gpu.present=true
  • SDL
    image

  • interface is present (in the order request)

I[2024-04-23|13:52:59.888] order detected                               module=bidengine-service cmp=provider order=order/akash1z6ql9vzhsumpvumj4zs8juv7l5u2zyr5yax2ys/16003145/1/1
I[2024-04-23|13:52:59.889] group fetched                                module=bidengine-order cmp=provider order=akash1z6ql9vzhsumpvumj4zs8juv7l5u2zyr5yax2ys/16003145/1/1
I[2024-04-23|13:52:59.889] requesting reservation                       module=bidengine-order cmp=provider order=akash1z6ql9vzhsumpvumj4zs8juv7l5u2zyr5yax2ys/16003145/1/1
D[2024-04-23|13:52:59.890] reservation requested. order=akash1z6ql9vzhsumpvumj4zs8juv7l5u2zyr5yax2ys/16003145/1/1, resources=[{"resource":{"id":1,"cpu":{"units":{"val":"1000"}},"memory":{"size":{"val":"4294967296"}},"storage":[{"name":"default","size":{"val":"10737418240"}}],"gpu":{"units":{"val":"1"},"attributes":[{"key":"vendor/nvidia/model/t4/interface/pcie","value":"true"}]},"endpoints":[{"kind":1,"sequence_number":0},{"sequence_number":0}]},"count":1,"price":{"denom":"uakt","amount":"1000000.000000000000000000"}}] module=provider-cluster cmp=provider cmp=service cmp=inventory-service
D[2024-04-23|13:52:59.890] reservation count                            module=provider-cluster cmp=provider cmp=service cmp=inventory-service cnt=1
I[2024-04-23|13:52:59.890] Reservation fulfilled                        module=bidengine-order cmp=provider order=akash1z6ql9vzhsumpvumj4zs8juv7l5u2zyr5yax2ys/16003145/1/1
  • bid info that script receives (see interface is missing)
    image

Additional info

It does see the ram though as expected:

  • SDL:
    image

image

  • ram is present (in the order request)
I[2024-04-23|13:51:53.406] order detected                               module=bidengine-service cmp=provider order=order/akash1z6ql9vzhsumpvumj4zs8juv7l5u2zyr5yax2ys/16003134/1/1
I[2024-04-23|13:51:53.407] group fetched                                module=bidengine-order cmp=provider order=akash1z6ql9vzhsumpvumj4zs8juv7l5u2zyr5yax2ys/16003134/1/1
I[2024-04-23|13:51:53.407] requesting reservation                       module=bidengine-order cmp=provider order=akash1z6ql9vzhsumpvumj4zs8juv7l5u2zyr5yax2ys/16003134/1/1
D[2024-04-23|13:51:53.407] reservation requested. order=akash1z6ql9vzhsumpvumj4zs8juv7l5u2zyr5yax2ys/16003134/1/1, resources=[{"resource":{"id":1,"cpu":{"units":{"val":"1000"}},"memory":{"size":{"val":"4294967296"}},"storage":[{"name":"default","size":{"val":"10737418240"}}],"gpu":{"units":{"val":"1"},"attributes":[{"key":"vendor/nvidia/model/t4/ram/16Gi","value":"true"}]},"endpoints":[{"kind":1,"sequence_number":0},{"sequence_number":0}]},"count":1,"price":{"denom":"uakt","amount":"1000000.000000000000000000"}}] module=provider-cluster cmp=provider cmp=service cmp=inventory-service
D[2024-04-23|13:51:53.407] reservation count                            module=provider-cluster cmp=provider cmp=service cmp=inventory-service cnt=1
I[2024-04-23|13:51:53.407] Reservation fulfilled                        module=bidengine-order cmp=provider order=akash1z6ql9vzhsumpvumj4zs8juv7l5u2zyr5yax2ys/16003134/1/1
@andy108369 andy108369 added repo/provider Akash provider-services repo issues awaiting-triage labels Apr 23, 2024
@andy108369
Copy link
Contributor Author

andy108369 added a commit to andy108369/provider that referenced this issue Apr 24, 2024
- Update parseGPU function to handle 'interface' attribute for GPUs
- Refactor attribute handling to avoid overwriting existing attributes
  (previous approach was reassigning the gpuVendorAttributes for each
attribute it parsed, which means only the last parsed attribute was
stored if a vendor has multiple attributes (like RAM and interface))

fixes akash-network/support#216
andy108369 added a commit to andy108369/provider that referenced this issue Apr 24, 2024
- Update parseGPU function to handle 'interface' attribute for GPUs

fixes akash-network/support#216
@andy108369
Copy link
Contributor Author

PR to fix this akash-network/provider#234

andy108369 added a commit to andy108369/provider that referenced this issue Apr 25, 2024
- Update parseGPU function to handle 'interface' attribute for GPUs

fixes akash-network/support#216
troian pushed a commit to akash-network/provider that referenced this issue Apr 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting-triage repo/provider Akash provider-services repo issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant