Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate incorrectly throws precision mismatch error for Table_Delimited #681

Closed
benjhirsch opened this issue Aug 16, 2023 · 10 comments · Fixed by #721 or #815
Closed

Validate incorrectly throws precision mismatch error for Table_Delimited #681

benjhirsch opened this issue Aug 16, 2023 · 10 comments · Fixed by #721 or #815
Assignees
Labels

Comments

@benjhirsch
Copy link

benjhirsch commented Aug 16, 2023

Checked for duplicates

Yes - I've already checked

🐛 Describe the bug

When performing content validation on a delimited table, Validate throws a field_value_format_precision_mismatch error if any value in a field has less precision than dictated by field_format.

🕵️ Expected behavior

Per StdRef 4B.1.2, field_format defines the maximum precision, not the only precision allowed:

For character tables, <field_format> is used to describe the maximum length and alignment of
the data. <field_format> also gives an indication of the maximum precision of real numbers, but
does not require all values to have this precision.

Validate does correctly not throw an error if the same table is described by a Table_Character object.

StdRef 4C.2, which describes delimited tables, states:

Values for attribute <field_format> are set as described in Section 4B.1.2.

...so the same behavior should apply.

📜 To Reproduce

Run Validate on the included labels. Check the validation report.

🖥 Environment Info

  • Operating System: Windows 11 Pro

📚 Version of Software Used

Validate Version 3.2.0

🩺 Test Data / Additional context

Included in the .zip file are:

Table_Delimited label
Table_Character label
CSV table that produces the error for delimited but not character
CSV table with whitespace removed that exhibits the same behavior (just in case)
Validation reports showing the error

ff_test.zip

🦄 Related requirements

No response

⚙️ Engineering Details

No response

I&T

TestRail Test ID: T8681195

@al-niessner
Copy link
Contributor

@jordanpadams

Reviewed this one and it is tricky. The table has the entry ,2.2 ,. If works with a character table because it is 4 characters but fails with delimited table because 2.2 is 3 an not 4 characters. Had the table been ,2.20, then it would have just passed both.

The tricky bit is, do we treat a space as a 0 or as an error. Obviously the char table is assuming space is 0 while delimited is not. From a PDS product definition point of view, which table is correct?

@benjhirsch
Copy link
Author

Validate will still throw the error without the space (that's why I included the *_no_space.csv file), so there's some fundamental difference in the way the two table types are being read. While adding a 0 would solve the problem, I think that implies false precision. So there isn't currently a way for a delimited table with variable precision (and field_format, which SBN requires) to be valid.

@al-niessner
Copy link
Contributor

@benjhirsch

Yes, I was being rather white space agnostic in the sense that blank and a space are the same. For character tables, a space has to be provided while delimiters are optional.

I do not know the PDS rules or concepts nor do I pretend to. Given just what is in this discussion, if the precision is defined to be 2 digits after the decimal but only one is provided, then either the missing has to be presumed 0 to make 2 digits or it is an error because it is not knowable what to use for the missing digit. If we can presume missing digits to be 0, then delimited table is in error. If we cannot presume it to be 0, then char table is in error. It all leans back on what is meant by the precision for PDS which is why I flubbed it off to @jordanpadams

@jordanpadams
Copy link
Member

@benjhirsch I need to dust off some old brain cells, but I am pretty sure delimited tables cannot have whitespace as padding for a float/integer/etc. because the value in a delimited table includes everything within the delimiters. So in reality, the ff_del test with the whitespace padding should really probably throw a datatype mismatch error, not a precision error.

@jordanpadams jordanpadams added B14.1 and removed B14.0 labels Sep 12, 2023
@jordanpadams
Copy link
Member

@al-niessner for this ticket, I think we want to throw a datatype mismatch (since 2.2 is not a number) instead of a precision mismatch if that is possible.

@miguelp1986
Copy link

@jordanpadams I'm still seeing the same error being thrown.

validate341 --skip-context-validation -t ff_test/ff_del.xml ff_test/ff_char.xml

PDS Validate Tool Report

Configuration:
   Version                       3.4.1
   Date                          2024-01-22T03:12:15Z

Parameters:
   Targets                       [file:/Users/MPena/Documents/PDS/validate_test_files/681/ff_test/ff_del.xml, file:/Users/MPena/Documents/PDS/validate_test_files/681/ff_test/ff_char.xml]
   Severity Level                WARNING
   Recurse Directories           true
   File Filters Used             [*.xml, *.XML]
   Data Content Validation       on
   Product Level Validation      on
   Max Errors                    100000
   Registered Contexts File      /usr/local/validate-3.4.1/resources/registered_context_products.json



Product Level Validation Results

  FAIL: file:/Users/MPena/Documents/PDS/validate_test_files/681/ff_test/ff_del.xml
    Begin Content Validation: file:/Users/MPena/Documents/PDS/validate_test_files/681/ff_test/ff_test.csv
      ERROR  [error.table.field_value_format_precision_mismatch]   data object 1, record 2, field 2: The number of digits to the right of the decimal point in the value '2.2' does not equal the precision set in the defined field format (expected 2, got 1).
    End Content Validation: file:/Users/MPena/Documents/PDS/validate_test_files/681/ff_test/ff_test.csv
        1 product validation(s) completed

  FAIL: file:/Users/MPena/Documents/PDS/validate_test_files/681/ff_test/ff_char.xml
      ERROR  [error.label.file_areas_duplicated_reference]   This file area references ff_test.csv that is already used by label urn:nasa:pds:bundle:collection:ff_del_test in file /Users/MPena/Documents/PDS/validate_test_files/681/ff_test/ff_del.xml
        2 product validation(s) completed

Summary:

  2 error(s)
  0 warning(s)

  Product Validation Summary:
    0          product(s) passed
    2          product(s) failed
    0          product(s) skipped

  Referential Integrity Check Summary:
    0          check(s) passed
    0          check(s) failed
    0          check(s) skipped

  Message Types:
    1            error.label.file_areas_duplicated_reference
    1            error.table.field_value_format_precision_mismatch

End of Report
Completed execution in 3096 ms

@al-niessner
Copy link
Contributor

@jordanpadams

Ah, I see. #815 resolves the mysteries from #681 and #722. Thanks. Can we close this as will be fixed by #815?

@jordanpadams
Copy link
Member

@al-niessner it will be closed by #815. although, while I was testing it, it actually looks like we are not checking precision at all for character tables, so I am trying to figure that out now.

@al-niessner
Copy link
Contributor

@jordanpadams

Yeah, I saw later that #815 was a pull request and not an issue.

Not sure why you think all character tables are not being checked but if you point me at it I may be able to help.

@jordanpadams
Copy link
Member

@al-niessner it is actually a very very weird oddity yet again with the standard. I will add you to a thread I just started with @jshughes and Co.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants