Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA extremely slow on example where CPU is fast #18

Open
juliusbierk opened this issue Feb 24, 2020 · 4 comments
Open

CUDA extremely slow on example where CPU is fast #18

juliusbierk opened this issue Feb 24, 2020 · 4 comments

Comments

@juliusbierk
Copy link

First of all: cool library. I am trying to familiarize myself with it.

I tried to make just a simple example. This code makes an image with a black-white gradient,
and using a loss functions to darken the image.
It runs fast on cpu, but cannot even render the first frame on the GPU (using an RTX 2080 Ti). It keeps the GPU at 100 % utilization, but nothing happens. I can run other examples just fine on the GPU.

Are there any glaring misunderstandings that I have gotten?

import taichi as ti

# ti.init(arch=ti.x86_64, debug=False)  # works
ti.init(arch=ti.cuda, debug=False)  # extremely slow

n = 320
pixels = ti.var(dt=ti.f32, shape=(n * 2, n), needs_grad=True)
loss = ti.var(dt=ti.f32, shape=(), needs_grad=True)

@ti.kernel
def paint(t: ti.f32):
    for i, j in pixels:
        loss[None] += ti.sqr(pixels[i, j])

@ti.kernel
def init():
    for i, j in pixels:
        pixels[i, j] = i/500. + j/500.

@ti.kernel
def apply_grad():
    for i, j in pixels:
        pixels[i, j] -= learning_rate * pixels.grad[i, j]

gui = ti.GUI("Tester", (n * 2, n))
init()

learning_rate = 0.01

for i in range(1000000):
    print(i)
    with ti.Tape(loss):
        paint(i * 0.1)
    apply_grad()
    print(pixels.grad[5, 5])

    gui.set_image(pixels)
    gui.show()

Thank you in advance for your help.

@juliusbierk
Copy link
Author

Btw. CUDA seems to stuck on the __exit__ part of with Tape, i.e. when calculating the gradients of paint().

@juliusbierk
Copy link
Author

Aha, found the problem.

Apparently gradients do not support the "smart indexing" used in the for loops.
Replacing paint with

@ti.kernel
def paint(t: ti.f32):
    for i in range(n * 2):
        for j in range(n):
            loss[None] += pixels[i, j] * pixels[i, j]

allows it to run on the gpu.

@robertour
Copy link

This is strange, the example in the documentation suggest that smart indexing is the way to go:
https://taichi.readthedocs.io/en/stable/hello.html

Also, from the documentation, then 2nd version of paint should be slower because only the outermost scope (in your case the loop for i in range(n * 2)) would be parallelized https://taichi.readthedocs.io/en/stable/hello.html#parallel-for-loops

A small observation: You are using ti.var, the example uses ti.field. I cannot find anything about ti.var in the documentation. What is ti.var?

I am trying to get my head around the examples, so I cannot help much more but hope it points you out in the right direction.

@juliusbierk
Copy link
Author

Hi @robertour . Thanks for your reply. I opened this issue back in February... I'm sure many things could have changed since then (e.g. ti.var no longer being used). Perhaps it also just works now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants