Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix importing VOC dataset with incorrect filename properties. Fix copying images after export. #122

Draft
wants to merge 5 commits into
base: dev
Choose a base branch
from

Conversation

YaserAlOsh
Copy link

Greetings,
When working with this package, I had two issues in importing a VOC dataset and exporting it to YoloV5 with images.

When importing a VOC dataset in pylabel, the filename property of each .xml annotation file is used to determine the name of the image name. That name is then used when exporting the dataset to other formats.
In case the name is incorrect or is empty, this process fails and when exporting we do not get the correct number of annotations back. For example, if all .xml files had the same 'filename' property, we only get one file after export.

I fixed it by looking through the images directory and looking for an image with the same name as the annotations file.

Another issue this fixes in exporter.py is when copying images if exporting to YoloV5. In the code, the annotation path is merged with the images path, which generates an incorrect path for the images. I fixed it by commenting the annotation path in the Path concatenation code.

I hope my changes will not break any functionality. Please let me know if there is a better way to solve my issue.

Thank you for making this package public.
Kind regards,
Yaser.

Remove the annotations path from the source image path that is used when copying images.
…OC if the path is specified.

This fixes the issue when the xml files do not have the name of the image in the filename property.
@alexheat
Copy link
Contributor

alexheat commented Jul 16, 2023

Thank you @YaserAlOsh ! There are fill issues with your pull request.

  1. It failed the validation tests for some reason. You can learn about the tests here https://github.com/pylabel-project/pylabel/blob/dev/tests/README.md
  2. Your fix for importing is very similar to another pull request yaml support chinese and handle voc format #118. Since that one that one passed the validation tests I have incorporated it into the latest version v52. Can you give it a try?
  3. I didn't understand this commit 03b0903. If the annotations are in a different folder than the images, than the pathtoannotations is needed

@YaserAlOsh
Copy link
Author

YaserAlOsh commented Jul 31, 2023

Hello @alexheat,
I apologize for my late reply.

I have just tried the latest version from GitHub (by reinstalling). It does seem to be importing everything correctly, but the values in 'img_filename' column do not contain the image extension.
When exporting with this code:
dataset.export.ExportToYoloV5(f"{export_name}/labels/",copy_images=True,use_splits=True),

I get the error:
FileNotFoundError: [Errno 2] No such file or directory: 'datasets\\dataset-main\\annotations\\datasets\\dataset-main\\images\\1232'

Which seems to be repeating the path and also has the wrong file name (no extension).

As for your third concern about the commit 03b0903, I think I changed the code to account for this issue (duplicating the path), but I am not sure if I did something wrong that caused it in the first place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants