Reporting for multiple Australian ETF Annual Tax Statements.
- Read the data from the Australian ETF Annual Tax Statements and reformat it as a split transaction to load into a cashbook (eg GnuCash).
- If the pdf format is not directly supported the intermediate csv file can be reworked using the example file as a template before loading the csv file.
Most Windows users will download etf2cb.exe
and the *.csv
files
then skip to the next section.
Python software can optionally be installed in a virtual environment to eliminate system conflicts as described here. eg in the desired folder for Windows:
python -m venv ./venv/etf2cb
.\venv\etf2cb\scripts\activate
cd .\venv\etf2cb
Use deactivate
to return to the normal system.
pip install git+https://github.com/flywire/ETF-Annual-Statement.git@main
This package uses tabula-py under the hood, which itself is a wrapper for tabula-java.
Windows & Linux users will need a copy of Java installed. You can download Java here.
Start a command prompt.
- run
etf2cb
if you downloaded the Windows executable - Alternatively, run under python, for example:
python etf2cb.py
usage: etf2cb [-h] filename [area]
Extract ETF Annual Statement transaction component splits
positional arguments:
filename Provide filename to extract
area Table area reference required for pdf file
options:
-h, --help show this help message and exit
Filenames with .csv extension are processed directly without pdf extract
Many things can fail with this automated process so users should validate the output manually. Firstly, check the total of all deposit amounts is zero.
Wildcards for a folder:
for %f in (tax\v*.pdf) do etf2cb %f vanguard
Concatenate files:
echo Entity,Date,Description,Account,Deposit > tax\all_splits.csv & type *split.csv >> tax\all_splits.csv
Amounts will be extracted from the pdf using the area reference unless run on
a .csv
file.
tabula-area.csv
must be specified to extract data from pdftax-acc.csv
must be configured for each label
The tabula-area.csv
file in the distribution is user-configurable,
For example,
change the second line to read the address entity instead of the HIN.
Tax account configuration is required in a user-configurable tax-acc.csv
file containing the following fields:
Label
- first part uses tax codes, second part uses strings in pdf labelsDescription
- details are optionalType
-CR
orDB
accountAccount
- users cashbook chart of account code
Look at the sample in the Tax
folder.
etf2cb tax\VAS-annual-tax-statement-2018 vanguard
Entity,Date,Description,Account,Deposit
X0123456789,30/06/2018,VAS,Income:Distribution:13U,-3606.42
X0123456789,30/06/2018,VAS,Income:Distribution:13C,-22870.71
X0123456789,30/06/2018,VAS,Income:Distribution:13Q,7069.35
X0123456789,30/06/2018,VAS,Income:Distribution:18H:18A,-1568.43
X0123456789,30/06/2018,VAS,Income:Distribution:18H:GCTGrossUp,-1567.05
X0123456789,30/06/2018,VAS,Income:Distribution:20E/20M,-463.54
X0123456789,30/06/2018,VAS,Income:Distribution:20O,15.64
X0123456789,30/06/2018,VAS,Asset:Shares:CostBase,-62.21
X0123456789,30/06/2018,VAS,Income:Distribution,23053.38
X0123456789,30/06/2018,VAS,Income:Distribution:Rounding,-0.01
There are many ways to accumulate the splits for all ETFs by entity and year.
A simple way is to concatenate all csv files then open it in a spreadsheet and do a pivot table. For example, Pivot Table Layout: Filters - [ETF] Description, Row Fields - Account, Data Fields - Sum Deposit.
In GnuCash use:
- File, Import, Import Transactions from csv,
- Next, Select ETF-Annual-Statement_split.csv, Next,
- Leading lines to skip: 1, Date format: d-m-y, Select Multi-split
- Select Date, Description, Account, Deposit, Next
- Map Account ID to Account Name, Next, Next, Close
This project is licensed under the terms of the GNU GPLv3 license.