Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducing results on visual checkpoint #23

Open
DWhettam opened this issue May 9, 2023 · 6 comments
Open

Reproducing results on visual checkpoint #23

DWhettam opened this issue May 9, 2023 · 6 comments

Comments

@DWhettam
Copy link

DWhettam commented May 9, 2023

Hi, I'm trying to reproduce the results in Table 7 of the paper - visual checkpoint only, manually specifying the sample rate, and I'm running into some issues. I have followed the suggestions in #13 and built a test script following the run_demo.py by looping over the files inside the test csv file, copying the exact process of run_demo.py but specifying a fixed sample rate. Currently my results are totally wrong (MAE 9.96). Do you have any suggestions for what I could be doing differently? Any help would really be appreciated!


test_df = pd.read_csv('countix_test_examples_clean.csv')

np.random.seed(0)
torch.manual_seed(0)
torch.backends.cudnn.enabled = False # 0.811

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

#-----------------------------------------------load models trained on Countix-AV-------------------------------------------
model = models.video.r2plus1d_18(pretrained=True)
model.fc = torch.nn.Linear(512,34*41)
model.fc2 = torch.nn.Linear(512,41)
model = convert_model(model)

if device.type == "cuda":
    model = torch.nn.DataParallel(model)

model = model.cuda()
checkpoint = torch.load('visual_checkpoint.pt')
model.load_state_dict(checkpoint['state_dict'])
model.eval()

tensor = torch.Tensor(np.arange(2,36)).type(torch.FloatTensor).cuda().unsqueeze(0)

data_path = 'path/my_path'

outputs_list = []
groundtruth_lists = []

for idx, row in tqdm(test_df.iterrows(), total=test_df.shape[0]):
    video_id = row[0]
    gt = row[-1]
    countix_start = float(row[3])
    countix_end = float(row[4])
    video_path = f"{data_path}/{video_id}.mp4"
    video, fps = read_video(video_path)
    video = video.astype(np.float16)
    video = video/255.0
    video = (video - np.array([0.485, 0.456, 0.406], dtype=np.float16).reshape((1, 1, 1, 3))) / np.array([0.229, 0.224, 0.225],dtype=np.float16).reshape(
        (1, 1, 1, 3))
    start = int(countix_start * fps)
    end = int(countix_end * fps)
    video = video[start:end]
    outputs = get_visual_count(video, args.sample_rate, model, tensor)
    outputs_list.append(outputs)
    groundtruth_lists.append(gt)

obo = sum(
        [1 if (pred >= gt - 1) and (pred <= gt + 1) else 0 for pred, gt in zip(outputs_list, groundtruth_lists)]) / float(len(groundtruth_lists))

np_outputs = np.array(outputs_list)
np_labels = np.array(groundtruth_lists)
mae_err = np.mean(np.abs(np_labels - np_outputs))
    
print(f"OBO ACCURACY: {obo}")
print(f"MAE: {mae_err}")

@guoxigan
Copy link

Specifying a fixed sample rate is not a good ideal!

@DWhettam
Copy link
Author

@guoxigan For sure - although I'm trying to replicate Table 7, which is with a fixed sample rate. I wanted to be able to test just the visual model to avoid training the full thing first. I have now got it working by implementing a testing script from scratch without using the util functions provided, although I'd be interested to know what the issue is with this version of the script

@guoxigan
Copy link

Your script looks correct, and I suspect you might be using a very small sampling rate. The smaller the sampling rate, the larger the mean absolute error (MAE).

@DWhettam
Copy link
Author

As I said, I'm trying to reproduce Table 7, so I used the sampling rates there. Do you have your testing code available online anywhere @guoxigan? I'd be interested to see it, thanks!

@guoxigan
Copy link

I'm sorry, but after I failed to reproduce Table 7, I deleted the previous project files. I suggest you first assume that the stride selection model can predict the optimal stride with 100% accuracy. In other words, use the following code to calculate the stride: “sample_rate = int(max((video.shape[0]/count+2)/32,1))”

@DWhettam
Copy link
Author

OK I'll have a play with it, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants