Skip to content

Latest commit

 

History

History
64 lines (45 loc) · 3.6 KB

W1.md

File metadata and controls

64 lines (45 loc) · 3.6 KB

This section describes how to run Hume as a standalone system.

System Preparation

For this mode, you will need to prepare a folder of CDR files (a richer article representation from DART), a ontology metadata file, and a config file.

You will also need to create a new working folder, we'll use runtime as an example in this document.

CDR Files

Detailed information of the schema of CDR can be found from DART side. For Hume's purposes, the fields below are required:

{
  "document_id": "required,str: unique identifier of article",
  "source_uri": "optional,str: how to fetch the document",
  "extracted_metadata": {
    "Author": "optional,str: author of article",
    "CreationDate": "optional,str: in '%Y-%m-%d' without quote format. Will help time resolution if provided",
    "Pages": "optional,int: Will help for genre determination"
  },
  "extracted_text": "required,str: original text of article",
  "content_type": "optional,str: mime_type string, will be used genre determination"
}

Files should be named [document_id].json, where [document_id] is replaced with the actual document_id.

Copy all json files into your runtime/corpus directory.

Ontology Metadata File

The ontology metadata file should be formatted according to the following schema: https://github.com/WorldModelers/Ontologies/blob/master/CompositionalOntology_metadata.yml

Docker Config File

The config file is a json file that must be visible at runtime from /extra/config.json inside the container. An example is provided as follows:

{
  "hume.domain": "WM",
  "hume.num_of_vcpus": "int, required: Number of cpu cores available to you. It at least needs to be 2, and please only include number of physical cores instead of SMT cores.",
  "hume.tmp_dir": "required,str: a bind point for sharing data in between your local system and docker instance",
  "hume.manual.cdr_dir": "required,str: A path, from inside docker that contains dir of docs you want to process, if you're following above, it should be /extra/corpus",
  "hume.manual.keep_pipeline_data": "optional,bool default to false: when enabled, we'll keep intermediate process file in between runs so you don't start from beginning. But if your previous run is finished and you'll kick off the new run, please set it to false for removing intermediate process file and run everything from scratch.",
  "hume.external_ontology_path": "optional, str: When provided, hume will use the ontology metadata file you provided instead of pre-shipped one. This path needs to be accessible from inside docker",
  "hume.external_ontology_version": "optional, str, but mandatory when you specify hume.external_ontology_path: id of the external ontology",
  "hume.use_regrounding_cache": "bool, optional: When enabled, the system will save intermediate files of processing result into a directory that if we see the same document later, we can skip certain processing steps and only kick off regrounding pipeline.",
  "hume.regrounding_cache_path": "str, required when hume.use_regrounding_cache is true: a persist directory for hosting regrounding cache. Ideally to be a persist storage that can be shared in between docker instances. Also please don't reuse grounding cache from OIAD."
}

Please copy the config file to runtime/config.json

To Run the Hume System

Run Hume with the following command:

docker run -it -v runtime:/extra docker.io/wmbbn/hume:R2022_03_21 /usr/local/envs/py3-jni/bin/python3 /wm_rootfs/git/Hume/src/python/dart_integration/manual_processing.py

After the run finishes, the results will be accessible at [hume.tmp_dir]/results/[TIMESTAMP]/results