Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cvat_format/extractor type casting value to float, causing issues #395

Closed
maitrai-maka opened this issue Jul 31, 2021 · 5 comments · Fixed by #403
Closed

cvat_format/extractor type casting value to float, causing issues #395

maitrai-maka opened this issue Jul 31, 2021 · 5 comments · Fixed by #403
Assignees
Labels
BUG Something isn't working

Comments

@maitrai-maka
Copy link

We have a text type attribute for a bounding box label in CVAT. The values for this attribute are all digits some of which start with zeros. Ex: 0012345, 0123456, 1234567, etc.

We are using datumaro to automatically do some checks on the annotations imported from CVAT. It looks like the extractor for CVAT is trying to typecast all imported numeric attribute values to float. This is causing an issue for us since this removes the leading zeros automatically from the value, changing user input.

Is there any reason behind typecasting the the attribute value to float? Can this typecasting step be deleted. If required a user can always do the typecasting later.

@nmanovic nmanovic added the BUG Something isn't working label Aug 2, 2021
@nmanovic
Copy link

nmanovic commented Aug 2, 2021

@kirill-sizov , could you please look at the issue?

@sizov-kirill
Copy link

@kirill-sizov , could you please look at the issue?

@nmanovic Yes, sure

@zhiltsov-max
Copy link
Contributor

zhiltsov-max commented Aug 2, 2021

Is there any reason behind typecasting the the attribute value to float?

Datumaro does not have strong attribute data model yet, where value ranges and data types could be described, it should be resolved with #144. So the way is how it is implemented now is our best effort in preserving the attribute data types - logical values, numbers and strings. We tried to have everything as string values, but it didn't work very well.

Can this typecasting step be deleted. If required a user can always do the typecasting later.

I think yes, especially in case of CVAT format. It has attribute type information, which can be utilized even without big changes.

BTW, could you share any details about your dataset checks? Are you using some custom code for this, which maybe can be shared? Is it possible that Datumaro does it for you with merge, stats and validate commands or any other?

@maitrai-maka
Copy link
Author

maitrai-maka commented Aug 2, 2021

could you share any details about your dataset checks?

We are doing simple checks on user input like string length (we know this attribute should have 7 digits). Which is failing because because of leading zeros being removed. All failed checks are exported in CSV format, which we use as a reference to make changes in CVAT.

CVAT has 2 attribute types: number and text for undefined user inputs. However, while importing it looks like both the types are being typecast to float without checking the input_type in CVAT xml currently. So, if a text type has only digits it is also being typecast into a float.

Is it possible that Datumaro does it for you with merge, stats and validate commands or any other?

We have not used stats or validate because we could not find examples for these commands. I think it would not matter if we did since, these commands will be executed after the dataset has already been extracted from CVAT xml, by which time data has already been typecast.

Are you using some custom code for this, which maybe can be shared?

We are iterating through each dataset_element->label->attribute to carry out checks.

for item in dataset:
    for box in item.annotations:

        # being forced to typecast the attribute value back to string
        #to check length
        if (len(str(box.attributes['text_test'])) !=7):
            print ("Error: Length of user input not 7.")

The above snippet will print an error if the attribute is defined as:

<attribute>
   <name>text_test</name>
   <mutable>False</mutable>
   <input_type>text</input_type>
   <default_value></default_value>
   <values></values>
</attribute>

and value is defined as 0203450 by <attribute name="text">0203450</attribute>. Datumaro is converting this 0203450 to 203450.0 .

@nmanovic
Copy link

nmanovic commented Aug 3, 2021

@zhiltsov-max , Datumaro should not convert text to a number.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BUG Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants