-
Notifications
You must be signed in to change notification settings - Fork 0
HOWTO: Batch Job Testing
This is meant to assist a developer or QA expert in testing the new code. Batch job import is not a standalone feature, rather it's a specific dependency for the Data Capture Module (DCM). It's not currently expected that Dataverse users will access this module directly. Instead, the DCM rsync implementation will rely on this.
- Build a war file using the 3353-batch-job-import branch and deploy to glassfish
-
Run the integration tests:
mvn test -Dtest=FileRecordJobIT -DfailIfNoTests=false
-
Note: This test can only be run on the same server as
dataverse.test.baseurl
since test files are generated directly in the Dataverse data directory. -
Note: if you sporadically produce failed tests because the job status polling has timed out:
JOB JSON {"id":3,"name":"FileSystemImportJob","status":"COMPLETED","exitStatus":"COMPLETED","createTime":1478797187875,"endTime":1478797188104,"lastUpdateTime":1478797188104,"startTime":1478797187877,"properties":{"userId":"b186b948","datasetId":"doi:10.5072/FK2/R3QAAE","mode":"MERGE"},"steps":[{"id":5,"name":"import-files","status":"COMPLETED","exitStatus":"COMPLETED","endTime":1478797188006,"startTime":1478797187909,"metrics":{"write_skip_count":0,"commit_count":1,"process_skip_count":0,"read_skip_count":0,"write_count":2,"rollback_count":0,"filter_count":0,"read_count":2},"persistentUserData":null},{"id":6,"name":"import-checksums","status":"COMPLETED","exitStatus":"COMPLETED","endTime":1478797188085,"startTime":1478797188015,"metrics":{"write_skip_count":0,"commit_count":1,"process_skip_count":0,"read_skip_count":0,"write_count":2,"rollback_count":0,"filter_count":0,"read_count":2},"persistentUserData":null}]} API: /api/datasets/:persistentId?persistentId=doi:10.5072/FK2/R3QAAE JOB API: /api/import/datasets/files/10.5072/FK2/R3QAAE JOB STATUS: STARTED JOB STATUS: STARTED JOB STATUS: STARTED JOB STATUS: STARTED JOB STATUS: STARTED JOB STATUS: STARTED JOB STATUS: STARTED JOB STATUS: STARTED JOB STATUS: STARTED JOB STATUS: STARTED JOB JSON: {"id":4,"name":"FileSystemImportJob","status":"STARTED","exitStatus":null,"createTime":1478797197987,"endTime":null,"lastUpdateTime":1478797197990,"startTime":1478797197990,"properties":{"userId":"b186b948","datasetId":"doi:10.5072/FK2/R3QAAE","mode":"MERGE"},"steps":[{"id":7,"name":"import-files","status":"COMPLETED","exitStatus":"COMPLETED","endTime":1478797204859,"startTime":1478797198804,"metrics":{"write_skip_count":0,"commit_count":1,"process_skip_count":0,"read_skip_count":0,"write_count":1,"rollback_count":0,"filter_count":2,"read_count":3},"persistentUserData":null},{"id":8,"name":"import-checksums","status":"STARTED","exitStatus":null,"endTime":null,"startTime":1478797206430,"metrics":{"write_skip_count":0,"commit_count":0,"process_skip_count":0,"read_skip_count":0,"write_count":0,"rollback_count":0,"filter_count":0,"read_count":0},"persistentUserData":null}]}
Try increasing the maximum polling retries and/or polling wait in FileRecordJobIT.properties:
polling.retries=10 polling.wait=1000
-
Create a new dataverse and dataset.
-
Use the new dataset id (e.g., MVKMO8) to manually create a directory under /usr/local/glassfish4/glassfish/domains/domain1/files/10.5072/FK2 since it isn't created by Dataverse until files are added.
-
Move your test dataset files into the dataverse data directory, within the new dataset folder (e.g., /usr/local/glassfish4/glassfish/domains/domain1/files/10.5072/FK2/MVKMO8)
-
Create a text file named files.sha that contains the SHA1 for each file (for testing purposes you can use fake SHA values). Copy it to the dataset folder. Example:
25af68407fab5dc6284a816cdd124b538cf9e68a kdc_apr11_d3_1_017.img 364a062c4141691772e14bc80b4b1f89ed778a52 kdc_apr11_d3_1_002.img e1de238f8a12076bb92bbb94342b8ed66282b854 kdc_apr11_d3_1_018.img 71f7203ef8d514a5236767c666979e44e072af37 kdc_apr11_d3_1_008.img
-
Execute an import job using the API:
curl -X POST --header 'Accept: application/json' --header 'X-Dataverse-key: my-key' 'http://localhost:8080/api/batch/jobs/import/datasets/files/10.5072/FK2/MVKMO8?mode=MERGE'
-
You should see a response like this:
{ "status": "OK", "data": { "executionId": 208, "message": "FileSystemImportJob in progress" } }
-
You can poll the job status with this URL: http://localhost:8080/api/admin/batch/jobs/208
-
Check your dataverse user notifications for status and review the dataset landing page for results
Copyright © 2016, The President and Fellows of Harvard College