-
Notifications
You must be signed in to change notification settings - Fork 4
Hashing service
David Megginson edited this page Feb 13, 2020
·
4 revisions
The HXL Proxy's Hashing service at /api/hash
generates an MD5 hash for either an entire HXL dataset, or just the header and hashtag rows. The service takes the following parameters:
Parameter | Is required? | Description |
---|---|---|
url | yes | URL of the HXL dataset to hash |
headers_only | no | If provided ("on"), hash only the last header row and the hashtag row |
The output is a JSON report with the 32-character hex-encoded MD5 digest, along with supporting metadata:
{
"hash": "6da2a59520de5c48549a7572b289c528",
"url": "https://docs.google.com/spreadsheets/d/1ytPD-f4a8CbNKTfMS3EqZOpBo9LWCk_NDKxJCgmpXA8/edit#gid=1101521524",
"date": "2018-11-20T16:22:22.836026",
"headers_only": true,
"headers": [
"Registro",
"Sector/Cluster",
"Organizaci\u00f3n",
"Hombres",
"Mujeres",
"Pa\u00eds",
"ISO",
"Dato"
],
"hashtags": [
"#meta+id",
"#sector+name+es",
"#org+name+es",
"#targeted+m",
"#targeted+f",
"#country+name+es",
"#country+code",
"#date"
]
}
With headers_only specified, the MD5 hash value can tell you if two datasets are essentially of the same type (e.g. HXL-hashtagged API output of the same humanitarian dataset for different countries or time periods).
With headers_only unspecified, the MD5 hash value can tell you whether a dataset has changed in any meaningful way since the last time you hashed it.
- Order of columns and HXL attributes is significant for caching (the same columns in a different order will produce a different MD5 digest).
- Differences in whitespace are not significant.
- The hashes are generated over a UTF-8 encoding of the data.
- All text headers are hashed first, then all hashtags (breadth-first).
- Null values are treated as empty strings.
Learn more about the HXL standard at http://hxlstandard.org