Skip to content

Use cases

David Megginson edited this page May 10, 2023 · 15 revisions

This page introduces sample use cases for different filters and other features of the HXL Proxy. Coders may choose to use JSON recipes instead of the interactive forms.

Privacy and protection

Goal Suggestions
Convert individual records to aggregated totals. Use the Count rows filter to aggregate by #adm1 (etc.).
Remove personal phone numbers from a dataset. Use the Cut columns filter to remove any columns tagged #contact+phone
Redact small numbers of affected people that could identify individuals. Use the Replace data filter with a regular expression to replace small numbers in columns tagged #affected (optionally after normalising numbers using the Clean data filter first).
Remove data from sensitive sectors (e.g. gender-based violence) before sharing publicly. Use the Select rows filter to remove rows based on the value of columns tagged #sector.
Remove records from an organisation that would prefer not to have its data shared. Use the Select rows filter to remove rows based on the value of columns tagged #org.

Reporting and visualisation

Goal Suggestions
Extract all activities for a specific organisation. Use the Select rows filter to select rows based on values in columns tagged #org.
Find the total number of organisations active in each district. Use the Deduplicate rows filter to select all unique combinations of #adm1 and #org, then use the Count rows filter to count the number of rows for each #adm1.
Sum the total number of people in need in each country. Use the Count rows filter with #country as the count tag, and #inneed as the aggregation tag.
Show only entries from 2014 or later. Use the Select rows filter with #date>=2014 as the [[row query
Group data by cluster. Use the Sort rows filter with #sector+cluster as the sort key.
Remove columns that a recipient does not want to see. Use the Cut columns filter to remove the columns by hashtag, e.g. #meta+id,#description+internal

Collaboration

Goal Suggestions
Combine input data from multiple reporting organisations. Create a master template, then use the Append datasets filter to pull in each organisation's dataset from a location on the web (such as HDX or a Dropbox folder).
Give partner organisations feedback on their datasets. Use the Validation page together with a custom [[HXL schema

Data cleaning

Goal Suggestions
Remove duplicate rows Use the Deduplicate rows filter to keep only the first copy of each row. If you have a combination of columns that form a unique identifier (e.g. #meta+id or #org+code), use those as tags to look at.
Normalise values for sorting and charting Use the Clean data filter to standardise the representation of dates and numbers, and to remove extra whitespace.
Find outlier values (potential typos) Use the Count rows filter to display the number of times each value appears in a column, then optionally use the Sort rows filter to bring the rarest occurrences to the top.
Fix an error or variant that keeps appearing in input data Start a replacement map (perhaps as a Google Sheet or spreadsheet in a shared Dropbox folder), then use the Replace data (mapping table) filter to fix values automatically.
Fix spelling variations in district names Use the Merge columns filter to pull standardised spellings from another dataset, with #adm1+code as the merge key, and #adm1+name as the column to copy. Select the "Use existing columns" and "Overwrite existing values" options to replace the current spellings in your source dataset.

Interoperability

Goal Suggestions
Compare data from different countries, where the country is not specified For each input dataset, use the Add column filter to add #country+name and #country+code as fixed values to every row before [[merging
Pull population data for districts from a different dataset Ensure that the sources have a shared key, such as a P-code in the #adm1+code column, then use the Merge columns filter to pull data from other datasets into your source dataset.
Check that the organisation identifiers in a dataset are correct Use a HXL schema and the Validation page to check the #org+code column against values in a separate, master dataset listing organisation names and codes.
Clone this wiki locally