add an einsum_distloss implementation #2

Spark001 · 2022-07-12T12:52:32Z

No description provided.

Spark001 · 2022-07-14T09:32:26Z

Updates: Provide a python-interface implementation for distortion loss, no cuda kernels needed.

def einsum_distloss(w, m, interval):
    '''
    Einsum realization of distortion loss.
    There are B rays each with N sampled points.
    w:        Float tensor in shape [B,N]. Volume rendering weights of each point.
    m:        Float tensor in shape [N].   Midpoint distance to camera of each point.
    interval: Scalar or float tensor in shape [B,N]. The query interval of each point.
    Note: 
    The first term of distortion could be experssed as `(w @ mm @ w.T).diagonal()`, which 
    could be further accelerated by einsum function `torch.einsum('bq, qp, bp->b', w, mm, w)`
    '''
    mm = (m.unsqueeze(-1) - m.unsqueeze(-2)).abs()  # [N,N]
    loss = torch.einsum('bq, qp, bp->b', w, mm, w)
    loss += (w*w*interval).sum(-1)/3.
    return loss.mean()

Spark001 · 2022-07-14T09:34:08Z

Peak GPU memory (MB)

# of pts `N`	32	64	128	256	384	512	1024
`original_distloss`	102	396	1560	6192	OOM	OOM	OOM
`eff_distloss_native`	12	24	48	96	144	192	384
`eff_distloss`	14	28	56	112	168	224	448
`flatten_eff_distloss`	13	26	52	104	156	208	416
`einsum_distloss`	9	18	36	72	109	145	292

Run time accumulated over 100 runs (sec)

# of pts `N`	32	64	128	256	384	512	1024
`original_distloss`	0.4	0.6	3.3	14.9	OOM	OOM	OOM
`eff_distloss_native`	0.2	0.2	0.2	0.4	0.4	0.5	0.8
`eff_distloss`	0.2	0.2	0.2	0.3	0.5	0.6	0.9
`flatten_eff_distloss`	0.2	0.2	0.2	0.3	0.5	0.5	0.8
`einsum_distloss`	0.1	0.1	0.1	0.2	0.3	0.4	0.7

bchretien · 2022-07-28T18:32:00Z

@Spark001: in your example, you are assuming that m is of size [N] and not [B, N], so identical for each ray. The allocation of mm is still O(N^2), and if you use [B, N] then you're back to the OOM of the original implementation, albeit with more samples. A workaround might be to have a custom CUDA kernel so that mm is never explicitly allocated but evaluated by each CUDA thread.

Spark001 · 2022-08-01T08:42:31Z

@bchretien Yes, you are right.
In my implementation I assume the sampling interval is identical for all rays. Because the uniform sampling is generally used for explicit voxel-based NERF methods, e.g. DVGO and Plenoxels.

剑匣 and others added 3 commits July 12, 2022 20:50

add an einsum_distloss implementation

a04d5d1

add an einsum_distloss implementation

a9f451a

Merge branch 'einsum_distloss' of https://github.com/Spark001/torch_e…

b3a6519

…fficient_distloss into einsum_distloss

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add an einsum_distloss implementation #2

add an einsum_distloss implementation #2

Spark001 commented Jul 12, 2022

Spark001 commented Jul 14, 2022

Spark001 commented Jul 14, 2022

bchretien commented Jul 28, 2022 •

edited

Loading

Spark001 commented Aug 1, 2022

add an einsum_distloss implementation #2

Are you sure you want to change the base?

add an einsum_distloss implementation #2

Conversation

Spark001 commented Jul 12, 2022

Spark001 commented Jul 14, 2022

Spark001 commented Jul 14, 2022

bchretien commented Jul 28, 2022 • edited Loading

Spark001 commented Aug 1, 2022

bchretien commented Jul 28, 2022 •

edited

Loading