Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Good job, but need dataloader #10

Open
ZhanYang-nwpu opened this issue Mar 4, 2024 · 3 comments
Open

Good job, but need dataloader #10

ZhanYang-nwpu opened this issue Mar 4, 2024 · 3 comments

Comments

@ZhanYang-nwpu
Copy link

Thank you very much. Your work has a lot of help for our study and research.
Can you provide the WebUAV's dataloader.py?

@983632847
Copy link
Owner

We recommend using webuav3m.py for training, and webuav3mdataset.py for test. You can easily apply these codes in Pytracking, STARK, OSTrack and more trackers.

@MrtXue
Copy link

MrtXue commented Aug 15, 2024

We recommend using webuav3m.py for training, and webuav3mdataset.py for test. You can easily apply these codes in Pytracking, STARK, OSTrack and more trackers.

我使用了ALL-IN-ONE里提供的webuav3m.py文件,并设置仅用WEBUAV数据集来训练OSTrack。发现测试UAV123时AUC异常的低,不懂是dataloader的问题还是数据集的gap问题,请问您有什么建议给我吗

@983632847 983632847 reopened this Oct 23, 2024
@983632847
Copy link
Owner

983632847 commented Oct 23, 2024

1.建议使用预训练权重(OSTrack, All-in-One的都行)
vitb_256_mae_ce_32x4_ep300.yaml:
PRETRAIN_FILE: "pretrained_models/All_in_One_ep0300.pth.tar"
PRETRAIN_FILE: "pretrained_models/OSTrack_ep0300.pth.tar"

lib/models/ostrack/ostrack.py:
if ('OSTrack' in cfg.MODEL.PRETRAIN_FILE or 'All_in_One' in cfg.MODEL.PRETRAIN_FILE) and training:
checkpoint = torch.load(cfg.MODEL.PRETRAIN_FILE, map_location="cpu")
missing_keys, unexpected_keys = model.load_state_dict(checkpoint["net"], strict=False)
print('Load pretrained model from: ' + cfg.MODEL.PRETRAIN_FILE)

2.尽量使用LaSOT,COCO,TrackingNet, GOT-10k这四个数据集微调,以及最近的VastTrack。没有句子标注,可以使用类别名或者DTVLT生成的文本描述。其它数据集,可能由于domain gap或者标注风格等原因,在自己的训练集上训练,自己的测试集上测试效果提升明显,但是迁移到别的数据集上就不太行。目前我也在寻找原因。

3.冻结backbone,只微调head和language encoder。
train_script.py:

# wrap networks to distributed one
net.cuda()

#Froze backbone except langauge and vision projects
for k, v in net.named_parameters():
    if 'backbone' in k:
        if ('language_proj' in k) or ('language_xz_proj' in k) or ('vision_x_proj' in k) or ('vision_z_proj' in k):
            v.requires_grad = True
        else:
            v.requires_grad = False

4.WebUAV-3M这个数据集的bounding box标注质量是非常高的(如果你仔细看过另外一个高质量的数据集LaSOT)。所以,原因更可能出现在句子标注的学习上,或许现在的VL tracker没有从句子标注中学到有用的知识。比如,我们当时做这个数据集时在句子描述中提供了具体坐标,希望模型能够像人一样关注到给定的区域(很遗憾,目前大部分模型应该不能学到类似的能力)。可以关注一下最近一个MLLM的工作Elysium,该模型能够根据给定的(归一化)坐标从视频中定位目标。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants