-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feedback: DATA Act Schema v0.7 #126
Comments
Gist of our Ten observations on DATA Act Information Model Schema v0.7 are as follows:
For any further details/ clarifications kindly feel free to contact at vishram.modak@irisindia.net |
Thank you for your feedback, Vishram. We will investigate and address your comments in the next release of the schema, as appropriate. |
Not all of the fields found on https://openbeta.usaspending.gov/developers or listed in the data dictionary. For example, descriptionofcontractrequirement. I'd like to understand the precise definition and more importantly the data lineage of this element. In fact, let me expand this comment to suggest that a data lineage be provided for all elements within the data dictionary. This is a common best practice in data governance. I have similar questions wrt firm8aflag and hubzoneflag. In looking at the data dictionary I do see Program_8A_Participant and HubZoneFirm. The data dictionary should be expanded to include these additional terms if needed as a crosswalk or standardize the terms so a user can easily search the data dictionary. Going back to providing a solid data definition and lineage - I would hope to see in a completed data dictionary where/how the 8a flag is set for example. Is it set based upon a lookup between FPDS and SAM based upon the contract signed date? Does it confirm that the participant was in the 8a program when the contract was signed, or option awarded (since that is what is in SAM), or is it a field that a human entered at some point because they manually did the checking, or is that field simply to indicate that the agency will receive 8a credit for the award (a company may have graduated from the 8a program (no longer a participant) but be on a long term vehicle that allows agencies to take that 8a credit for another 5 years past the end of program participation date, like 8a STARS II or another vehicle awarded while the company was an 8a). We need some specificity in the definitions. This is just one example. Please don't just say "it must be what is in FPDS-NG".. that doesn't solve the question or meet the basic requirements of a data dictionary. We must have lineage - mapping from information producer to information consumer - in order to have some trust in the data. |
@HerschelC I think some of the fields were renamed to be more descriptive. With respect to your 8a example, I agree that the meaning and definition should be clear and part of the data dictionary, but I think that's different than documenting the process for how agencies populate an element. Different agencies might have different ways of verifying whether a business is 8a. Maybe it's in SAM, maybe their 8a status is too recent to be in same so they verify manually with SBA. Or are you saying that this should be multiple values (instead of a boolean) to better understand the context? |
I think that the Boolean value is fine for the data element. What I'm talking about is more from the perspective of a data dictionary and the management of data quality and enabling the information consumer to determine whether the data is fit for their purpose. I think that it is vital to know the lineage of data from where it is produced, however it is produced, to where it is consumed. This is to give the information consumer some degree of confidence as to the quality of the data that they are about to use to support their fact-based decision. For example, if I know that an element is manually entered in the award system or parsed from a free form text field entered by a grantee, I'll rate my confidence level in the quality of that element lower than if that element was a lookup from SAM. The 8a example you gave pains me to read. All that extra effort when there is a system of record for that information. In that vein, from a data governance perspective, if I saw this as an answer to how data is created, I'd likely focus process improvement efforts on that process. If it were a critical element, perhaps the executives using that data didn't know that's how it was created; or if it is a low-value element, why are people spending those extra cycles to populate when an automated process is available. [Speaking generally, likely good reasons for things to be so. But concept for process improvement is the point.] But, using that example, the data producer, Agency X, would have a business rule something to the affect of "8a flag is populated through an API integration with SAM or a manual verification process by the CO". We need to know how the information is being produced in order to understand the quality of the data - and to optimize the enterprise information management process that should be put around the data to govern it. This shouldn't be a big ask - hopefully this is being done as part of the data inventory and assessment process to populate and manage the data. Let's capture that good knowledge rather than lose it to be recreated the next time. I'm not proposing that the particular capture method of each element be reported in the system - simply that the methods (business rules for producing the data) be documented. Perhaps, in the case of very critical elements, a flag to indicate lookup vs manual entry would make sense. Take for example the credit card industry where credit card transactions have a flag to indicate the card number was read from the card or keyed in. I don't think we're there yet - but having a complete data dictionary would enable a deeper analysis of data quality. Every step along the information lifecycle is a potential for introduction of a data defect. Every point along the way should be documented - Agency X, this is how information is produced and this is how it is sent to System A; System A, this his how we receive/validate/change/correct the element and/or use it to derive another element which is sent to System B; System B, this is how we change the element - so that the consumers at the end of the information lifecycle know who all has touched, how and why the data. This is also valuable information for the data producer who will ultimately be called on to answer questions about the quality of the data that they produced - avoiding the finger pointing between systems and organizations in the information lifecycle. This is fairly standard practice in data governance. Also, I should note, that this is intended in business context, not IT. I understand that there are security concerns about posting system/element level data. Not the case here. It's logical - this element comes from the Agency X award system and is created upon award by a lookup to SAM (or the summer intern will type it in). |
Oh - on the renaming, I totally understand. Very tactically, when I'm on the openbeta site trying to look at the fields there, I ctrl-C the element and then flip over to the data dictionary page to search for it. I'm just asking to my life easier by including the cross walk on the data dictionary page. (Completely support the better names btw) |
December 22, 2017 DAIMS v1.2 Release
This is the place to leave feedback on v0.7 of the DATA Act Schema. You can read more about the schema here: http://fedspendingtransparency.github.io/data-exchange-standard/
Federal Spending Collaboration home page: http://fedspendingtransparency.github.io/
The text was updated successfully, but these errors were encountered: