Skip to content

HOWTO: Batch Job Testing

bmckinney edited this page Dec 5, 2016 · 12 revisions

Overview

This is meant to assist a developer or QA expert in testing the new code. Batch job import is not a standalone feature, rather it's a specific dependency for the Data Capture Module (DCM). It's not currently expected that Dataverse users will access this module directly. Instead, the DCM rsync implementation will rely on this.

Steps

1. War file

2. Automated tests

  • Run the integration tests: mvn test -Dtest=FileRecordJobIT -DfailIfNoTests=false

  • Note: This test can only be run on the same server as dataverse.test.baseurl since test files are generated directly in the Dataverse data directory.

  • Note: if you sporadically produce failed tests because the job status polling has timed out:

    JOB JSON {"id":3,"name":"FileSystemImportJob","status":"COMPLETED","exitStatus":"COMPLETED","createTime":1478797187875,"endTime":1478797188104,"lastUpdateTime":1478797188104,"startTime":1478797187877,"properties":{"userId":"b186b948","datasetId":"doi:10.5072/FK2/R3QAAE","mode":"MERGE"},"steps":[{"id":5,"name":"import-files","status":"COMPLETED","exitStatus":"COMPLETED","endTime":1478797188006,"startTime":1478797187909,"metrics":{"write_skip_count":0,"commit_count":1,"process_skip_count":0,"read_skip_count":0,"write_count":2,"rollback_count":0,"filter_count":0,"read_count":2},"persistentUserData":null},{"id":6,"name":"import-checksums","status":"COMPLETED","exitStatus":"COMPLETED","endTime":1478797188085,"startTime":1478797188015,"metrics":{"write_skip_count":0,"commit_count":1,"process_skip_count":0,"read_skip_count":0,"write_count":2,"rollback_count":0,"filter_count":0,"read_count":2},"persistentUserData":null}]}
    API: /api/datasets/:persistentId?persistentId=doi:10.5072/FK2/R3QAAE
    JOB API: /api/import/datasets/files/10.5072/FK2/R3QAAE
    JOB STATUS: STARTED
    JOB STATUS: STARTED
    JOB STATUS: STARTED
    JOB STATUS: STARTED
    JOB STATUS: STARTED
    JOB STATUS: STARTED
    JOB STATUS: STARTED
    JOB STATUS: STARTED
    JOB STATUS: STARTED
    JOB STATUS: STARTED
    JOB JSON: {"id":4,"name":"FileSystemImportJob","status":"STARTED","exitStatus":null,"createTime":1478797197987,"endTime":null,"lastUpdateTime":1478797197990,"startTime":1478797197990,"properties":{"userId":"b186b948","datasetId":"doi:10.5072/FK2/R3QAAE","mode":"MERGE"},"steps":[{"id":7,"name":"import-files","status":"COMPLETED","exitStatus":"COMPLETED","endTime":1478797204859,"startTime":1478797198804,"metrics":{"write_skip_count":0,"commit_count":1,"process_skip_count":0,"read_skip_count":0,"write_count":1,"rollback_count":0,"filter_count":2,"read_count":3},"persistentUserData":null},{"id":8,"name":"import-checksums","status":"STARTED","exitStatus":null,"endTime":null,"startTime":1478797206430,"metrics":{"write_skip_count":0,"commit_count":0,"process_skip_count":0,"read_skip_count":0,"write_count":0,"rollback_count":0,"filter_count":0,"read_count":0},"persistentUserData":null}]}

    Try increasing the maximum polling retries and/or polling wait in FileRecordJobIT.properties:

    polling.retries=10
    polling.wait=1000
    

3. Manual tests

  • Create a new dataverse and dataset.

  • Use the new dataset id (e.g., MVKMO8) to manually create a directory under /usr/local/glassfish4/glassfish/domains/domain1/files/10.5072/FK2 since it isn't created by Dataverse until files are added.

  • Move your test dataset files into the dataverse data directory, within the new dataset folder (e.g., /usr/local/glassfish4/glassfish/domains/domain1/files/10.5072/FK2/MVKMO8)

  • Create a text file named files.sha that contains the SHA1 for each file (for testing purposes you can use fake SHA values). Copy it to the dataset folder. Example:

    25af68407fab5dc6284a816cdd124b538cf9e68a  kdc_apr11_d3_1_017.img
    364a062c4141691772e14bc80b4b1f89ed778a52  kdc_apr11_d3_1_002.img
    e1de238f8a12076bb92bbb94342b8ed66282b854  kdc_apr11_d3_1_018.img
    71f7203ef8d514a5236767c666979e44e072af37  kdc_apr11_d3_1_008.img
    
  • Execute an import job using the API:

    curl -X POST --header 'Accept: application/json' --header 'X-Dataverse-key: my-key' 'http://localhost:8080/api/batch/jobs/import/datasets/files/10.5072/FK2/MVKMO8?mode=MERGE'
    
  • You should see a response like this:

    {
      "status": "OK",
       "data": {
        "executionId": 208,
        "message": "FileSystemImportJob in progress"
      }
    }
  • You can poll the job status with this URL: http://localhost:8080/api/admin/batch/jobs/208

  • Check your dataverse user notifications for status and review the dataset landing page for results