-
Notifications
You must be signed in to change notification settings - Fork 113
Description
Hello,
I'm running into a problem while running inference where trying to use a higher number of inputs leads to an unexpected error message. For 1, 3, and 8 views (for both the provided assets_demo_cli and my own dataset mimicking the same format), things work as expected, but for 16, 32, I get the following error:
INFO:root:Loaded ViT-H-14 model config.
INFO:root:Loading pretrained ViT-H-14 weights (laion2b_s32b_b79k).
0%| | 0/1 [00:04<?, ?it/s]
Traceback (most recent call last):
File "/workspace/stable-virtual-camera/demo.py", line 407, in
fire.Fire(main)
File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 135, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 468, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/workspace/stable-virtual-camera/demo.py", line 375, in main
for _ in video_path_generator:
File "/workspace/stable-virtual-camera/seva/eval.py", line 1481, in run_one_scene
) = chunk_input_and_test(
File "/workspace/stable-virtual-camera/seva/eval.py", line 693, in chunk_input_and_test
if len(chunk) == T - len(prefix_inds) or not candidate_input_inds:
TypeError: unsupported operand type(s) for -: 'list' and 'int'
This occurs when we require a second pass based on the context length T (as in the paper).
To reproduce the issue, simply download the assets_demo_cli and run the provided command line argument (set to your own data_path):
python demo.py --data_path assets_demo_cli --data_items dl3d140-165f5af8bfe32f70595a1c9393a6e442acf7af019998275144f605b89a306557 --num_inputs 16 --video_save_fps 10