-
Notifications
You must be signed in to change notification settings - Fork 4
Use cases
David Megginson edited this page May 10, 2023
·
15 revisions
This page introduces sample use cases for different filters and other features of the HXL Proxy. Coders may choose to use JSON recipes instead of the interactive forms.
Goal | Suggestions |
---|---|
Convert individual records to aggregated totals. | Use the Count rows filter to aggregate by #adm1 (etc.). |
Remove personal phone numbers from a dataset. | Use the Cut columns filter to remove any columns tagged #contact+phone |
Redact small numbers of affected people that could identify individuals. | Use the Replace data filter with a regular expression to replace small numbers in columns tagged #affected (optionally after normalising numbers using the Clean data filter first). |
Remove data from sensitive sectors (e.g. gender-based violence) before sharing publicly. | Use the Select rows filter to remove rows based on the value of columns tagged #sector. |
Remove records from an organisation that would prefer not to have its data shared. | Use the Select rows filter to remove rows based on the value of columns tagged #org. |
Goal | Suggestions |
---|---|
Extract all activities for a specific organisation. | Use the Select rows filter to select rows based on values in columns tagged #org. |
Find the total number of organisations active in each district. | Use the Deduplicate rows filter to select all unique combinations of #adm1 and #org, then use the Count rows filter to count the number of rows for each #adm1. |
Sum the total number of people in need in each country. | Use the Count rows filter with #country as the count tag, and #inneed as the aggregation tag. |
Show only entries from 2014 or later. | Use the Select rows filter with #date>=2014 as the [[row query |
Group data by cluster. | Use the Sort rows filter with #sector+cluster as the sort key. |
Remove columns that a recipient does not want to see. | Use the Cut columns filter to remove the columns by hashtag, e.g. #meta+id,#description+internal |
Goal | Suggestions |
---|---|
Combine input data from multiple reporting organisations. | Create a master template, then use the Append datasets filter to pull in each organisation's dataset from a location on the web (such as HDX or a Dropbox folder). |
Give partner organisations feedback on their datasets. | Use the Validation page together with a custom [[HXL schema |
Goal | Suggestions |
---|---|
Remove duplicate rows | Use the Deduplicate rows filter to keep only the first copy of each row. If you have a combination of columns that form a unique identifier (e.g. #meta+id or #org+code), use those as tags to look at. |
Normalise values for sorting and charting | Use the Clean data filter to standardise the representation of dates and numbers, and to remove extra whitespace. |
Find outlier values (potential typos) | Use the Count rows filter to display the number of times each value appears in a column, then optionally use the Sort rows filter to bring the rarest occurrences to the top. |
Fix an error or variant that keeps appearing in input data | Start a replacement map (perhaps as a Google Sheet or spreadsheet in a shared Dropbox folder), then use the Replace data (mapping table) filter to fix values automatically. |
Fix spelling variations in district names | Use the Merge columns filter to pull standardised spellings from another dataset, with #adm1+code as the merge key, and #adm1+name as the column to copy. Select the "Use existing columns" and "Overwrite existing values" options to replace the current spellings in your source dataset. |
Goal | Suggestions |
---|---|
Compare data from different countries, where the country is not specified | For each input dataset, use the Add column filter to add #country+name and #country+code as fixed values to every row before [[merging |
Pull population data for districts from a different dataset | Ensure that the sources have a shared key, such as a P-code in the #adm1+code column, then use the Merge columns filter to pull data from other datasets into your source dataset. |
Check that the organisation identifiers in a dataset are correct | Use a HXL schema and the Validation page to check the #org+code column against values in a separate, master dataset listing organisation names and codes. |
Learn more about the HXL standard at http://hxlstandard.org