Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feedback: DATA Act Schema v0.7 #126

Open
kaitlin opened this issue Dec 31, 2015 · 6 comments
Open

Feedback: DATA Act Schema v0.7 #126

kaitlin opened this issue Dec 31, 2015 · 6 comments

Comments

@kaitlin
Copy link

kaitlin commented Dec 31, 2015

This is the place to leave feedback on v0.7 of the DATA Act Schema. You can read more about the schema here: http://fedspendingtransparency.github.io/data-exchange-standard/

Federal Spending Collaboration home page: http://fedspendingtransparency.github.io/

@ghost
Copy link

ghost commented Jan 19, 2016

Gist of our Ten observations on DATA Act Information Model Schema v0.7 are as follows:

  1. Concepts which are “Hypercube [cube]” incorrectly provided with data attribute as “xbrl:item” in substitution group instead of “xbrldt:hypercubeItem”
  2. Concepts which are “Axis [Rollup/ Dimension]” incorrectly provided with data attribute as “xbrl:item” in substitution group instead of “xbrldt:dimensionItem"
  3. Relationship between “Table [cube/ hypercubes]” and its “Axis [Rollup/ Dimension]” defined by incorrect arc-role "http://xbrl.org/int/dim/arcrole/domain-member" instead of "http://xbrl.org/int/dim/arcrole/all”
  4. Relationship between “Axis [Rollup/ Dimension]” and its “Domain” defined by incorrect arc-role “http://xbrl.org/int/dim/arcrole/domain-member” instead of "http://xbrl.org/int/dim/arcrole/hypercube-dimension"
  5. Relationship between “Domain” and its “Domain-member [Member]” defined by incorrect arc-role “http://xbrl.org/int/dim/arcrole/domain-member” instead of "http://xbrl.org/int/dim/arcrole/dimension-domain"
  6. Relationship between “Table (cube/ hypercubes)” and its “Axis [Rollup/ Dimension]” should be defined by @xbrldt:targetRole role attribute
  7. For every hypercube concept, "xbrldt:contextElement" with values of “segment” or “scenario" must be provided
  8. Member [Child member] cannot be repeated in same role namely concept "{Treasury Account Symbol Entry ID} [Member]" repeated twice in role "TAS-ProgramActiviy-ObjectClass Package" and concept "{Treasury Account Symbol Entry ID} [Member]" repeated twice in role "TAS-ProgramActiviy-ObjectClass-Award Package"
  9. Two concepts developed in taxonomy schema having same documentation [Duplication of same concept] namely "treas_ByAwardandModificationEntryIDDimension" and "treas_ByAwardandModificationEntryIDRollupDimension"
  10. Concepts and its standard labels named correctly but spelled incorrectly, for example
    a) treas_ByDirect-ReimburseableFundingSourceDimension
    b) treas_SeaTranportationExpected
    c) treas_BalancesbyTAS-ProgramActiviy-ObjectClass-AwardCube

For any further details/ clarifications kindly feel free to contact at vishram.modak@irisindia.net

@DATAActPMO
Copy link

Thank you for your feedback, Vishram. We will investigate and address your comments in the next release of the schema, as appropriate.

@HerschelC
Copy link

Not all of the fields found on https://openbeta.usaspending.gov/developers or listed in the data dictionary. For example, descriptionofcontractrequirement. I'd like to understand the precise definition and more importantly the data lineage of this element. In fact, let me expand this comment to suggest that a data lineage be provided for all elements within the data dictionary. This is a common best practice in data governance. I have similar questions wrt firm8aflag and hubzoneflag. In looking at the data dictionary I do see Program_8A_Participant and HubZoneFirm. The data dictionary should be expanded to include these additional terms if needed as a crosswalk or standardize the terms so a user can easily search the data dictionary.

Going back to providing a solid data definition and lineage - I would hope to see in a completed data dictionary where/how the 8a flag is set for example. Is it set based upon a lookup between FPDS and SAM based upon the contract signed date? Does it confirm that the participant was in the 8a program when the contract was signed, or option awarded (since that is what is in SAM), or is it a field that a human entered at some point because they manually did the checking, or is that field simply to indicate that the agency will receive 8a credit for the award (a company may have graduated from the 8a program (no longer a participant) but be on a long term vehicle that allows agencies to take that 8a credit for another 5 years past the end of program participation date, like 8a STARS II or another vehicle awarded while the company was an 8a). We need some specificity in the definitions. This is just one example. Please don't just say "it must be what is in FPDS-NG".. that doesn't solve the question or meet the basic requirements of a data dictionary. We must have lineage - mapping from information producer to information consumer - in order to have some trust in the data.

@kaitlin
Copy link
Author

kaitlin commented Mar 7, 2016

@HerschelC I think some of the fields were renamed to be more descriptive. descriptionofcontractrequirement is now Award Description to be more generic across procurement and assistance data.

With respect to your 8a example, I agree that the meaning and definition should be clear and part of the data dictionary, but I think that's different than documenting the process for how agencies populate an element. Different agencies might have different ways of verifying whether a business is 8a. Maybe it's in SAM, maybe their 8a status is too recent to be in same so they verify manually with SBA. Or are you saying that this should be multiple values (instead of a boolean) to better understand the context?

@HerschelC
Copy link

I think that the Boolean value is fine for the data element. What I'm talking about is more from the perspective of a data dictionary and the management of data quality and enabling the information consumer to determine whether the data is fit for their purpose.

I think that it is vital to know the lineage of data from where it is produced, however it is produced, to where it is consumed. This is to give the information consumer some degree of confidence as to the quality of the data that they are about to use to support their fact-based decision. For example, if I know that an element is manually entered in the award system or parsed from a free form text field entered by a grantee, I'll rate my confidence level in the quality of that element lower than if that element was a lookup from SAM.

The 8a example you gave pains me to read. All that extra effort when there is a system of record for that information. In that vein, from a data governance perspective, if I saw this as an answer to how data is created, I'd likely focus process improvement efforts on that process. If it were a critical element, perhaps the executives using that data didn't know that's how it was created; or if it is a low-value element, why are people spending those extra cycles to populate when an automated process is available. [Speaking generally, likely good reasons for things to be so. But concept for process improvement is the point.]

But, using that example, the data producer, Agency X, would have a business rule something to the affect of "8a flag is populated through an API integration with SAM or a manual verification process by the CO". We need to know how the information is being produced in order to understand the quality of the data - and to optimize the enterprise information management process that should be put around the data to govern it. This shouldn't be a big ask - hopefully this is being done as part of the data inventory and assessment process to populate and manage the data. Let's capture that good knowledge rather than lose it to be recreated the next time.

I'm not proposing that the particular capture method of each element be reported in the system - simply that the methods (business rules for producing the data) be documented. Perhaps, in the case of very critical elements, a flag to indicate lookup vs manual entry would make sense. Take for example the credit card industry where credit card transactions have a flag to indicate the card number was read from the card or keyed in. I don't think we're there yet - but having a complete data dictionary would enable a deeper analysis of data quality.

Every step along the information lifecycle is a potential for introduction of a data defect. Every point along the way should be documented - Agency X, this is how information is produced and this is how it is sent to System A; System A, this his how we receive/validate/change/correct the element and/or use it to derive another element which is sent to System B; System B, this is how we change the element - so that the consumers at the end of the information lifecycle know who all has touched, how and why the data. This is also valuable information for the data producer who will ultimately be called on to answer questions about the quality of the data that they produced - avoiding the finger pointing between systems and organizations in the information lifecycle. This is fairly standard practice in data governance.

Also, I should note, that this is intended in business context, not IT. I understand that there are security concerns about posting system/element level data. Not the case here. It's logical - this element comes from the Agency X award system and is created upon award by a lookup to SAM (or the summer intern will type it in).

@HerschelC
Copy link

Oh - on the renaming, I totally understand. Very tactically, when I'm on the openbeta site trying to look at the fields there, I ctrl-C the element and then flip over to the data dictionary page to search for it. I'm just asking to my life easier by including the cross walk on the data dictionary page. (Completely support the better names btw)

mmeintelwade pushed a commit that referenced this issue Dec 22, 2017
December 22, 2017 DAIMS v1.2 Release
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants