$ git checkout https://github.com/ufbmi/onefl-deduper.git
$ cd onefl-deduper
Execute the following scripts on the SQLServer:
schema/000/upgrade.sql schema/000/data.sql
INSERT INTO dbo.partner
(partner_code, partner_description, partner_added_at)
VALUES
('SOURCE_1', 'source 1', GETDATE()),
('SOURCE_2', 'source 2', GETDATE())
$ mkvirtualenv deduper -p `which python3`
$ pip install -r requirements-to-freeze.txt
$ cp some/long/path/file.csv data/partner_hashes.csv
$ cp config/example/settings_linker.py.example config/settings_linker.py
$ cp config/example/logs.cfg.example config/logs.cfg
Before running the linker for each source, update the following parameters
in the config/settings_linker.py
file as needed.
IN_DELIMITER - indicates what is the record separator (comma or tab)
IN_FILE - the name of your input file (containing the PHI elements)
OUT_FILE - the name of the file which will be sent to University of Florida
DB_HOST - the name of the SQL server
DB_NAME - the name of the SQL database
DB_USER - the database service account name
DB_PASS - the database service account pasword
Substitute the [PARTNERNAME] by the actual value in the following command
$ PYTHONPATH=. python run/linker.py -i data -o data -p=PARTNERNAME --ask
When completed, you should be able to see new rows inserted in the database,
and an output file as configured with OUT_FILE
option.
This output file should contain four columns:
PATID UUID hash_1 hash_2
To check how many rows have been insreted in the database you can run the following query:
SELECT COUNT(*) FROM LINKAGE WHERE partner_code = 'PARTNERNAME'