Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CVAT attributes getting stripped by saving in datumaro format #191

Closed
lambertwx opened this issue Mar 29, 2021 · 3 comments · Fixed by #192
Closed

CVAT attributes getting stripped by saving in datumaro format #191

lambertwx opened this issue Mar 29, 2021 · 3 comments · Fixed by #192

Comments

@lambertwx
Copy link

Attributes are being stripped if you serialize to the datumaro json format and then export as CVAT format.

Here's how to replicate:

python -m datumaro import -i tests/assets/cvat_dataset/for_images/train.xml -f cvat --copy -o dmu_test_copy --overwrite
python -m datumaro export -p dmu_test_copy -f cvat -o cvat_from_dmcopy --overwrite

If all was working perfectly, you would expect cvat_from_dmcopy/train.xml to have the same label attributes that are present in tests/assets/cvat_dataset/for_images/train.xml. However, it does not - the attributes are missing. Interestingly, the attributes are present on the items in dmu_test_copy/dataset/annotations/train.json (though they aren't in the categories there). So they at least made it into the JSON.

Version of datumaro: v0.1.7

Also note that the problem only happens if you serialize to Datumaro format. If you skip that step, e.g.

python -m datumaro import -i tests/assets/cvat_dataset/for_images/train.xml -f cvat -o dmu_test --overwrite
python -m datumaro export -p dmu_test -f cvat -o cvat_from_dm --overwrite

the attributes are preserved correctly in cvat_from_dm/train.xml.

The reason I care about this is because I am using Datumaro merge to merge two CVAT datasets and want the attributes to persist when I export the merged dataset.
E.g, if I do:

python -m datumaro import -i tests/assets/cvat_dataset/for_images/train.xml -f cvat -o dmu_lwtest --overwrite
python -m datumaro import -i tests/assets/cvat_dataset/for_images/train.xml -f cvat -o dmu_lwtest2 --overwrite
python -m datumaro merge dmu_lwtest/ dmu_lwtest2/ -o dmu_merged --quorum 1 --overwrite
python -m datumaro export -p dmu_merged -f cvat -o cvat_merged --overwrite

the attributes will be missing from cvat_merged/train.xml. I don't see a way around this because as far as I know the inputs and outputs of merge have to be in datumaro format.

I am willing to help fix this problem, but first wanted to get feedback about whether this is desired behavior and what is the best place to fix it. E.g., is the problem that the attributes didn't make it into the categories field of the Datumaro json? Or is the problem in the CvatConverter that is invoked by the export?

@zhiltsov-max
Copy link
Contributor

zhiltsov-max commented Mar 30, 2021

Hi, thanks for the report. Please check if the patch helps with annotation attributes. The problem turned out to be both in Datumaro and CVAT formats. Note that we still can't save all the meta information (attribute properties - value range, type, mutability etc. - #144 ), so you might need to add this information manually.

I don't see a way around this because as far as I know the inputs and outputs of merge have to be in datumaro format.

Inputs are projects, so they can have sources in any formats. But the output is a project in the internal format.

Steps to experiment with the patch:

pip install git+https://github.com/openvinotoolkit/datumaro@zm/fix-191

or

git clone
cd datumaro
git checkout zm/fix-191
pip install -e .

@lambertwx
Copy link
Author

Thanks for the quick turnaround! I tested the patch. Everything seems to be working fine.

If you feel it would be helpful, I have created a second CVAT annotation file that can go in test/assets/cvat_dataset/for_images and can be used for testing merges with two slightly different sets of labels. I could create a separate MR for that.

@zhiltsov-max
Copy link
Contributor

Ok, thanks for testing the fix.

Probably, such test won't cover anything new, because we have tests for merges in https://github.com/openvinotoolkit/datumaro/blob/develop/tests/test_ops.py#L217. If you see it covers some uncovered functionality and use cases, you're welcome to submit a PR and contribute.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants