Skip to content

Commit

Permalink
Merge pull request #914 from PolicyEngine/fix/spi-fixes
Browse files Browse the repository at this point in the history
SPI validation fixes
  • Loading branch information
nikhilwoodruff authored Aug 8, 2024
2 parents 4281007 + c9c723d commit 8b7cbcd
Show file tree
Hide file tree
Showing 5 changed files with 878 additions and 1,805 deletions.
6 changes: 6 additions & 0 deletions changelog_entry.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
- bump: minor
changes:
fixed:
- Correct Scottish income tax rates for 2020/21
added:
- Docs page about SPI 2020/21 validation
1 change: 1 addition & 0 deletions docs/book/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ parts:
- file: programs/gov/dcms/bbc/tv-licence
- file: programs/gov/revenue_scotland/land-and-buildings-transaction-tax
- file: programs/gov/wra/land-transaction-tax
- file: model/spi-validation
- caption: Using the model
chapters:
- file: usage/getting-started
80 changes: 80 additions & 0 deletions docs/book/model/spi-validation.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Validation against the Survey of Personal Incomes\n",
"\n",
"To ensure accuracy and precision, PolicyEngine's free and open-source simulation and microsimulation model is periodically validated against existing external datasets. This process involves taking external datasets meant to represent a particular slice of British society and loading their inputs, such as total individual employment income, into our data model. We then compare our calculated outputs with theirs in an effort to find discrepancies between the two. In this post, we'll describe our efforts to validate our model against the newly released Survey of Personal Incomes (SPI) 2020/21 dataset.\n",
"\n",
"## The UK Survey of Personal Incomes\n",
"\n",
"The Survey of Personal Incomes (SPI) is a comprehensive dataset produced by HM Revenue and Customs (HMRC). It provides detailed information on individuals' income, tax, and other financial data based on a sample of HMRC records. The SPI is among the most comprehensive UK individual-level financial datasets, and is used by HMRC, HM Treasury, and individual Members of Parliament, among others.[^1]\n",
"\n",
"## Validation results\n",
"\n",
"We chose to validate PolicyEngine against the SPI's total individual income, using the SPI's input variables to compare PolicyEngine UK's calculated total income tax against the SPI's reported amount. \n",
"\n",
"To do so, we first remove the SPI's \"composite records,\" which are used to represent smaller population groups as a collective. This group of records is only meant to be employed within society-wide modelling, and its outputs cannot be correctly computed using its inputs. This group of records contains 1,770 observations, representing around 0.2% of total records.\n",
"\n",
"We then treat relevant variables from the SPI 2020/21 dataset[^2] as input variables into the PolicyEngine UK model. Using this data, we run household-level simulations for each input and categorize the outputs logarithmically based on how close PolicyEngine's calculated 2020/21 total income tax is to that of the SPI. We treat a result within £10 as a match due to differences in rounding. Our final match statistics are displayed in the table below."
]
},
{
"cell_type": "markdown",
"metadata": {
"vscode": {
"languageId": "plaintext"
}
},
"source": [
"| Total error (£) | Percentage of all records |\n",
"|-----------------|---------------------------|\n",
"| <=10 | 92.7% |\n",
"| <100 | 93.5% |\n",
"| <1,000 | 98.3% |\n",
"| >=1,000 | 1.7% |"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For 92.7% of individuals included within the SPI, PolicyEngine UK's calculated total income tax is within £10 of the SPI's calculated amount. We view this as a robust demonstration of PolicyEngine's accuracy and precision.\n",
"\n",
"## Analysing model discrepancies\n",
"\n",
"For the remaining records where our calculations do not closely match the SPI, we believe the differences are attributable to two main factors:\n",
"\n",
"1. Differences in input information: The SPI sometimes uses additional input data that is not available to PolicyEngine, leading to differences in calculations.\n",
"2. Structural differences: At times, the SPI model's structure differs from that of PolicyEngine in a way that complicates further model validation.\n",
"\n",
"Four relevant error clusters appeared among the records where the SPI and PolicyEngine calculate differing outputs:\n",
"\n",
"1. Cases where an individual appears to transfer a portion of their Personal Allowance to their spouse as part of the Marriage Allowance. This is because the SPI removes the transferred amount from the individual's Personal Allowance, but does not indicate that the subsequent lower amount is driven by the Marriage Allowance. This may represent as much as 37,998 records (4.8% of total).\n",
"1. Cases where an individual receives less than £100,000 in annual income and is non-resident for taxation purposes. In this situation, the individual receives £0 Personal Allowance, even though our model would project a full £12,500 allowance. The SPI uses a variable that is not publicly available to calculate this status. This represents 2,054 records (0.3% of total).\n",
"1. Cases where both the SPI and PolicyEngine calculate the same amount of taxable savings income for an individual receiving any form of savings income classified by the SPI as \"Other Investment Income\" (e.g., interest on securities), but the SPI finds that they owe around £1,000 more in income tax than we do. This represents 1,276 records (0.2% of total).\n",
"1. Cases where an individual near the Personal Allowance taper value who claims Gift Aid receives a full Personal Allowance of £12,500 within the SPI model. The SPI model's Gift Aid variable signifies the amount an individual deducts due to Gift Aid donations, while PolicyEngine's represents an individual's total Gift Aid-eligible donations. We then calculate the indivdual's deduction amount from this value. This error represents 738 records (0.1% of total).\n",
"\n",
"## Conclusion\n",
"PolicyEngine's UK model shows a high level of accordance with the SPI's 2020/21 data, with almost 93% of records with the same inputs returning the same total income tax calculation. While some discrepancies exist, they are limited in scope and often attributable to either testability limitations within the SPI's structure or to inputs that the SPI has not made publicly available. This validation exercise builds upon our previous validations, such as our [Autumn 2023 model calibration](https://policyengine.org/uk/research/uk-calibration-2023-q4). We welcome questions and comments on our findings or methodology - please feel free to [get in touch](mailto:contact@policyengine.org)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[^1]: https://data.europa.eu/data/datasets/hmrc-survey-of-personal-incomes-spi?locale=en\n",
"[^2]: https://www.gov.uk/government/statistics/personal-incomes-statistics-for-the-tax-year-2020-to-2021"
]
}
],
"metadata": {
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ brackets:
values:
2018-06-01: 12_150
2020-03-04:
value: 12_444
value: 12_658
reference:
- title: "Capture in full: Tax rates 2020/21; Spring Budget 2020"
href: https://taxscape.deloitte.com/taxtables/deloitte-uk-tax-rates-2020-21--updated.pdf
Expand All @@ -80,6 +80,7 @@ brackets:
unit: /1
values:
2018-06-01: 0.4
2020-03-04: 0.41
threshold:
description: The threshold at which an individual pays the Scottish higher rate
metadata:
Expand Down
Loading

0 comments on commit 8b7cbcd

Please sign in to comment.