-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SHOW Visualization] Which part of code to refer #10
Comments
I try to use TalkSHOW code to visualize data but I get the bad result. Do you know the reason? My code is as follow (from lower_pose = torch.tensor(
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 3.0747, -0.0158, -0.0152, -1.1826512813568115, 0.23866955935955048,
0.15146760642528534, -1.2604516744613647, -0.3160211145877838,
-0.1603458970785141, 1.1654603481292725, 0.0, 0.0, 1.2521806955337524, 0.041598282754421234, -0.06312154978513718,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0])
lower_pose_stand = torch.tensor([
8.9759e-04, 7.1074e-04, -5.9163e-06, 8.9759e-04, 7.1074e-04, -5.9163e-06,
3.0747, -0.0158, -0.0152,
-3.6665e-01, -8.8455e-03, 1.6113e-01, -3.6665e-01, -8.8455e-03, 1.6113e-01,
-3.9716e-01, -4.0229e-02, -1.2637e-01,
7.9163e-01, 6.8519e-02, -1.5091e-01, 7.9163e-01, 6.8519e-02, -1.5091e-01,
7.8632e-01, -4.3810e-02, 1.4375e-02,
-1.0675e-01, 1.2635e-01, 1.6711e-02, -1.0675e-01, 1.2635e-01, 1.6711e-02, ])
def part2full(input, stand=False):
if stand:
lp = torch.zeros_like(lower_pose)
lp[6:9] = torch.tensor([3.0747, -0.0158, -0.0152])
lp = lp.unsqueeze(dim=0).repeat(input.shape[0], 1).to(input.device)
else:
lp = lower_pose.unsqueeze(dim=0).repeat(input.shape[0], 1).to(input.device)
input = torch.cat([input[:, :3],
lp[:, :15],
input[:, 3:6],
lp[:, 15:21],
input[:, 6:9],
lp[:, 21:27],
input[:, 9:12],
lp[:, 27:],
input[:, 12:]]
, dim=1)
return input
def main():
# * create smplex model
zelin_log.info('init smlpx model...')
dtype = torch.float64
smplx_path = './visualise/'
model_params = dict(model_path=smplx_path,
model_type='smplx',
create_global_orient=True,
create_body_pose=True,
create_betas=True,
num_betas=300,
create_left_hand_pose=True,
create_right_hand_pose=True,
use_pca=False,
flat_hand_mean=False,
create_expression=True,
num_expression_coeffs=100,
num_pca_comps=12,
create_jaw_pose=True,
create_leye_pose=True,
create_reye_pose=True,
create_transl=False,
dtype=dtype,
)
smplx_model = smplx.create(**model_params).to(device)
# * load smplx param
# this is DiffSHEG output
pred_smplx = np.load('results/talkshow_88/test_custom_audio/talkshow_GesExpr_unify_addHubert_encodeHubert_mdlpIncludeX_condRes_LN_ClsFree/fixStart10/ckpt_e2599_ddim25_lastStepInterp/pid_1/Forrest_tts.npy')
pred_smplx = torch.from_numpy(pred_smplx).float().to(device)[0][:100]
pred_smplx = part2full(pred_smplx, stand=True)
# * pred_smplx size: [n_frames, param_dim]
import tqdm
vertices = []
betas = torch.zeros([1, 300], dtype=torch.float64).to(device)
for frame_ind in tqdm.tqdm(range(pred_smplx.shape[0]), desc='infer mesh vectices per frame'):
sample_output: SMPLOutput = smplx_model.forward(
betas=betas,
jaw_pose=pred_smplx[frame_ind][0:3].unsqueeze_(dim=0),
leye_pose=pred_smplx[frame_ind][3:6].unsqueeze_(dim=0),
reye_pose=pred_smplx[frame_ind][6:9].unsqueeze_(dim=0),
global_orient=pred_smplx[frame_ind][9:12].unsqueeze_(dim=0),
body_pose=pred_smplx[frame_ind][12:75].unsqueeze_(dim=0),
left_hand_pose=pred_smplx[frame_ind][75:120].unsqueeze_(dim=0),
right_hand_pose=pred_smplx[frame_ind][120:165].unsqueeze_(dim=0),
expression=pred_smplx[frame_ind][165:265].unsqueeze_(dim=0),
return_verts=True,
)
vertices.append(sample_output.vertices.detach().cpu().numpy().squeeze())
vertices = np.asarray(vertices)
print(vertices.shape)
# * debug Render
exp_dir = 'exp/speech2smplx'
os.makedirs(exp_dir, exist_ok=True)
num_frames = vertices.shape[0]
# dataset is inverse
vertices = vertices.reshape(vertices.shape[0], -1, 3)
vertices[:, :, 1] = -vertices[:, :, 1]
vertices[:, :, 2] = -vertices[:, :, 2]
width, height = 800, 1440
viewport_height = 1440
z_offset = 1.8
video_fname = 'demo'
os.makedirs(f'{exp_dir}/video_frames', exist_ok=True)
writer = cv2.VideoWriter(f'{exp_dir}/{video_fname}.mp4', cv2.VideoWriter_fourcc(*'mp4v'), 30, (width, height), True)
center = np.mean(vertices[0], axis=0)
render_helper = pyrender.OffscreenRenderer(viewport_width=800, viewport_height=viewport_height)
class Struct(object):
def __init__(self, **kwargs):
for key, val in kwargs.items():
setattr(self, key, val)
path = os.path.join(os.getcwd(), 'visualise/smplx/SMPLX_NEUTRAL.npz')
model_data = np.load(path, allow_pickle=True)
data_struct = Struct(**model_data)
for i_frame in tqdm.tqdm(range(num_frames), desc='render debug image'):
vectice = vertices[i_frame]
# todo save vectice as npz
imgi = render_mesh_helper((vectice, data_struct.f), center, camera='o', r=render_helper, y=0.7, z_offset=z_offset)
imgi = imgi.astype(np.uint8)
# save image as frame
cv2.imwrite(f'{exp_dir}/video_frames/{i_frame:04d}.png', imgi)
# save image as video
writer.write(imgi)
writer.release()
if __name__ == '__main__':
main() |
Hi James, you may want to pay attention to the code here: Line 146 in 3ebf305
|
Thank you for reply! Yes, so what is the the order of channel of these output files? I mean I read these The order is important since I need input them for this function to get mesh: pred_smplx = np.load('Forrest_tts.npy')
sample_output: SMPLOutput = smplx_model.forward(
betas=betas,
jaw_pose=pred_smplx[0][0:3].unsqueeze_(dim=0),
leye_pose=pred_smplx[0][3:6].unsqueeze_(dim=0),
reye_pose=pred_smplx[0][6:9].unsqueeze_(dim=0),
global_orient=pred_smplx[0][9:12].unsqueeze_(dim=0),
body_pose=pred_smplx[0][12:75].unsqueeze_(dim=0),
left_hand_pose=pred_smplx[0][75:120].unsqueeze_(dim=0),
right_hand_pose=pred_smplx[0][120:165].unsqueeze_(dim=0),
expression=pred_smplx[0][165:265].unsqueeze_(dim=0),
return_verts=True,
) |
Hello, did you render the result correctly? |
Hi @jameskuma, this is my code to visualize the SHOW results, which is modified from the visualization code in TalkSHOW. Remember to specify the face_path and gesture_path arguments.
|
Hello, I get similar results to @jameskuma I tried to understand if there is a mismatch in parameters in DiffSHEG output and SHOW SMPLX model input but everything seems okay. Has anyone been able to find the right way to render SHOW results? @JeremyCJM I tried running your code but I cannot figure out what the face_path and gesture_path are since the DiffSHEG model only gives one npy output. Also, not quite sure why it creates a dataset and loader for the whole talkSHOW dataset whilst infering one output. Can you help me use your code for a single inference from the .npy output DiffSHEG gives? Any help in visualising would be appreciated! Forrest_tts_diffsheg_show.mp4Here is my code:
|
Dear author,
Thank you for this awesome work!
I run the
inference
part of this repo using SHOW dataset, and I only get a bunch of.npz
.However, how to visualize them with visualization tool in
TalkSHOW
. I mean which part of code should I used to visualize the results?Best regards
The text was updated successfully, but these errors were encountered: