In order to setup local development, you must have docker installed and if you want to run it locally you must have python 3.9.6 or greater installed
make a folder called /data as well as inside that /bulk and inside that a folder for any usernames you wish it to work with
data -bulk -username -username
if you want to run locally you must install requirements.txt for python3
to run locally run /deployment/bin/entrypoint.sh
to run inside docker run /run_in_docker.sh
- to test use ./run_tests.sh
- requires python 3.9.6 or higher
- requires installation on mac of libmagic
brew install libmagic
Included configurations for the Visual Studio Code debugger for python that mirror what is in the entrypoint.sh and testing configuration to run locally in the debugger, set breakpoints and if you open the project in VSCode the debugger should be good to go. The provided configurations can run locally and run tests locally
When releasing a new version:
- Update the release notes
- Update the version in staging_service/app.py.VERSION
to run locally you will need all of these utils on your system: tar, unzip, zip, gzip, bzip2, md5sum, head, tail, wc
in the docker container all of these should be available
all paths should be specified treating the user's home directory as root
URL : ci.kbase.us/services/staging_service/test-service
local URL : localhost:3000/test-service
Method : GET
Code : 200 OK
Content example
This is just a test. This is only a test.
URL : ci.kbase.us/services/staging_service/test-auth
local URL : localhost:3000/test-auth
Method : GET
Headers : Authorization: <Valid Auth token>
Code : 200 OK
Content example
I'm authenticated as <username>
Condition : if authentication is incorrect
Code : 401 Unauthorized
Content :
Error Connecting to auth service ...
Code : 400 Bad Request
Content
Must supply token
URL : ci.kbase.us/services/staging_service/file-lifetime
local URL : localhost:3000/file-lifetime
Method : GET
Code : 200 OK
Content example number of days a file will be held for in staging service before being deleted this is not actually handled by the server but is expected to be performed by a cron job which shares the env variable read here
90
defaults to not show hidden dotfiles
URL : ci.kbase.us/services/staging_service/list/{path to directory}
URL : ci.kbase.us/services/staging_service/list/{path to directory}?showHidden={True/False}
local URL : localhost:3000/list/{path to directory}
local URL : localhost:3000/list/{path to directory}?showHidden={True/False}
Method : GET
Headers : Authorization: <Valid Auth token>
Code : 200 OK
Content example
[
{
"name": "testFolder",
"path": "nixonpjoshua/testFolder",
"mtime": 1510949575000,
"size": 96,
"isFolder": true
},
{
"name": "testfile",
"path": "nixonpjoshua/testfile",
"mtime": 1510949629000,
"size": 335,
"isFolder": false
}
]
Condition : if authentication is incorrect
Code : 401 Unauthorized
Content :
Error Connecting to auth service ...
Code : 400 Bad Request
Content
Must supply token
Code : 404 Not Found
Content :
path <username>/<incorrect path> does not exist
URL : ci.kbase.us/services/staging_service/download/{path to file}
URL : ci.kbase.us/services/staging_service/download/{path to file}
local URL : localhost:3000/download/{path to file}
local URL : localhost:3000/download/{path to file}
Method : GET
Headers : Authorization: <Valid Auth token>
Code : 200 OK
Content : <file content>
Condition : if authentication is incorrect
Code : 401 Unauthorized
Content :
Error Connecting to auth service ...
Code : 400 Bad Request
Content
Must supply token
Code : 400 Bad Request
Content :
<username>/<incorrect path> is a directory not a file
Code : 404 Not Found
Content :
path <username>/<incorrect path> does not exist
defaults to not show hidden dotfiles
URL : ci.kbase.us/services/staging_service/search/{search query}
URL : ci.kbase.us/services/staging_service/search/{search query}?showHidden={True/False}
local URL : localhost:3000/search/{search query}
local URL : localhost:3000/search/?showHidden={True/False}
Method : GET
Headers : Authorization: <Valid Auth token>
Code : 200 OK
Content example
[
{
"name": "testfile",
"path": "nixonpjoshua/testfile",
"mtime": 1510949629000,
"size": 335,
"isFolder": false
},
{
"name": "testFolder",
"path": "nixonpjoshua/testFolder",
"mtime": 1510949575000,
"size": 96,
"isFolder": true
},
{
"name": "testinnerFile",
"path": "nixonpjoshua/testFolder/testinnerFile",
"mtime": 1510949575000,
"size": 0,
"isFolder": false
}
]
Condition : if authentication is incorrect
Code : 401 Unauthorized
Content :
Error Connecting to auth service ...
Code : 400 Bad Request
Content
Must supply token
URL : ci.kbase.us/services/staging_service/metadata/{path to file or folder}
local URL : localhost:3000/metadata/{path to file or folder}
Method : GET
Headers : Authorization: <Valid Auth token>
Code : 200 OK
Content example
{
"name": "testFolder",
"path": "nixonpjoshua/testFolder",
"mtime": 1510949575000,
"size": 96,
"isFolder": true
}
{
"md5": "73cf08ad9d78d3fc826f0f265139de33",
"lineCount": "13",
"head": "there is stuff in this file\nthere is stuff in this file\nthere is stuff in this file\nthere is stuff in this file\nthere is stuff in this file\nthere is stuff in this file\nthere is stuff in this file\nstuff at the bottom\nstuff at the bottom\nstuff at the bottom",
"tail": "there is stuff in this file\nthere is stuff in this file\nthere is stuff in this file\nstuff at the bottom\nstuff at the bottom\nstuff at the bottom\nstuff at the bottom\nstuff at the bottom\nstuff at the bottom\nstuff at the bottom",
"name": "testFile",
"path": "nixonpjoshua/testFile",
"mtime": 1510949629000,
"size": 335,
"isFolder": false
}
Condition : if authentication is incorrect
Code : 401 Unauthorized
Content :
Error Connecting to auth service ...
Code : 400 Bad Request
Content
Must supply token
Code : 404 Not Found
Content :
path <username>/<incorrect path> does not exist
URL : ci.kbase.us/services/staging_service/upload
local URL : localhost:3000/upload
Method : POST
Headers : Authorization: <Valid Auth token>
Body constraints
first element in request body should be
destPath: {path file should end up in}
second element in request body should be multipart file data
uploads: {multipart file}
Files starting with whitespace or a '.' are not allowed
Code : 200 OK
Content example
[
{
"name": "fasciculatum_supercontig.fasta",
"path": "nixonpjoshua/fasciculatum_supercontig.fasta",
"mtime": 1510950061000,
"size": 31536508,
"isFolder": false
}
]
Condition : if authentication is incorrect
Code : 401 Unauthorized
Content :
Error Connecting to auth service ...
Code : 400 Bad Request
Content
Must supply token
URL : ci.kbase.us/services/staging_service/define-upa/{path to imported file}
local URL : localhost:3000/define-upa/{path to imported file}
Method : POST
Headers : Authorization: <Valid Auth token>
Body constraints
first element in request body should be
UPA: {the actual UPA of imported file}
Code : 200 OK
Content example
successfully update UPA <UPA> for file <Path>
Condition : if authentication is incorrect
Code : 401 Unauthorized
Content :
Error Connecting to auth service ...
Code : 400 Bad Request
Content
Must supply token
Code : 400 Bad Request
Content
must provide UPA field in body
URL : ci.kbase.us/services/staging_service/delete/{path to file or folder}
local URL : localhost:3000/delete/{path to file or folder}
Method : DELETE
Headers : Authorization: <Valid Auth token>
Code : 200 OK
Content example
successfully deleted UPA <Path>
Condition : if authentication is incorrect
Code : 401 Unauthorized
Content :
Error Connecting to auth service ...
Code : 400 Bad Request
Content
Must supply token
Code : 404 Not Found
Content
could not delete <Path>
Code : 403 Forbidden
Content
cannot delete home directory
cannot delete protected file
URL : ci.kbase.us/services/staging_service/mv/{path to file or folder}
local URL : localhost:3000/mv/{path to file or folder}
Method : PATCH
Headers : Authorization: <Valid Auth token>
Body constraints
first element in request body should be
newPath : {the new location/name for file or folder}
Code : 200 OK
Content example
successfully moved <path> to <newPath>
Condition : if authentication is incorrect
Code : 401 Unauthorized
Content :
Error Connecting to auth service ...
Code : 400 Bad Request
Content
Must supply token
Code : 400 Bad Request
Content
must provide newPath field in body
Code : 403 Forbidden
Content
cannot rename home or move directory
cannot rename or move protected file
Code: 409 Conflict
Content
<newPath> allready exists
supported archive formats are:
.zip, .ZIP, .tar.gz, .tgz, .tar.bz, .tar.bz2, .tar, .gz, .bz2, .bzip2
URL : ci.kbase.us/services/staging_service/decompress/{path to archive
local URL : localhost:3000/decompress/{path to archive}
Method : PATCH
Headers : Authorization: <Valid Auth token>
Code : 200 OK
Content example
successfully decompressed <path to archive>
Condition : if authentication is incorrect
Code : 401 Unauthorized
Content :
Error Connecting to auth service ...
Code : 400 Bad Request
Content
Must supply token
Code : 400 Bad Request
Content
cannot decompress a <file extension> file
After authenticating at this endpoint, AUTH is queried to get your filepath and globus id file for linking to globus.
URL : ci.kbase.us/services/staging_service/add-acl
local URL : localhost:3000/add-acl
Method : GET
Headers : Authorization: <Valid Auth token>
Code : 200 OK
Content example
{
"success": true,
"principal": "KBase-Example-59436z4-z0b6-z49f-zc5c-zbd455f97c39",
"path": "/username/",
"permissions": "rw"
}
Condition : if authentication is incorrect
Code : 401 Unauthorized
Content :
Error Connecting to auth service ...
Condition : If issue with Globus API or ACL Already Exists
Code : 500 Internal Server Error
Content
{
'success': False,
'error_type': 'TransferAPIError',
'error': "Can't create ACL rule; it already exists",
'error_code': 'Exists', 'shared_directory_basename': '/username/'
}
After authenticating at this endpoint, AUTH is queried to get your filepath and globus id file for linking to globus.
URL : ci.kbase.us/services/staging_service/remove-acl
local URL : localhost:3000/remove-acl
Method : GET
Headers : Authorization: <Valid Auth token>
Code : 200 OK
Content example
{
"message": "{\n \"DATA_TYPE\": \"result\",\n \"code\": \"Deleted\",
"message\": \"Access rule 'KBASE-examplex766ada0-x8aa-x1e8-xc7b-xa1d4c5c824a' deleted successfully\",
"request_id\": \"x2KFzfop05\",\n \"resource\": \"/endpoint/KBaseExample2a-5e5b-11e6-8309-22000b97daec/access/KBaseExample-ada0-d8aa-11e8-8c7b-0a1d4c5c824a\"}",
"Success": true
}
Condition : if authentication is incorrect
Code : 401 Unauthorized
Content :
Error Connecting to auth service ...
Condition : If issue with Globus API or ACL Already Exists
Code : 500 Internal Server Error
Content
{
'success': False,
'error_type': 'TransferAPIError',
'error': "Can't create ACL rule; it already exists",
'error_code': 'Exists', 'shared_directory_basename': '/username/'
}
This endpoint parses one or more bulk specification files in the staging area into a data structure (close to) ready for insertion into the Narrative bulk import or analysis cell.
It can parse .tsv
, .csv
, and Excel (.xls
and .xlsx
) files. Templates for the currently
supported data types are available in the templates
directory of this repo. See the README.md file
for instructions on template usage.
See the import specification ADR document for design details.
URL : ci.kbase.us/services/staging_service/bulk_specification
local URL : localhost:3000/bulk_specification
Method : GET
Headers : Authorization: <Valid Auth token>
Code : 200 OK
Content example
GET bulk_specification/?files=file1.<ext>[,file2.<ext>,...]
<ext>
is one of csv
, tsv
, xls
, or xlsx
.
Reponse:
{
"types": {
<type 1>: [
{<spec.json ID 1>: <value for ID, row 1>, <spec.json ID 2>: <value for ID, row 1>, ...},
{<spec.json ID 1>: <value for ID, row 2>, <spec.json ID 2>: <value for ID, row 2>, ...},
...
],
<type 2>: [
{<spec.json ID 1>: <value for ID, row 1>, <spec.json ID 2>: <value for ID, row 1>, ...},
...
],
...
},
"files": {
<type 1>: {"file": "<username>/file1.<ext>", "tab": "tabname"},
<type 2>: {"file": "<username>/file2.<ext>", "tab": null},
...
}
}
<type N>
is a data type ID from the Mappings.py file and the Narrative staging area configuration file - it is a shared namespace between the staging service and Narrative to specify bulk applications, and has a 1:1 mapping to an app. It is determined by the first header line from the templates.<spec.json ID N>
is the ID of an input parameter from aKB-SDK
app'sspec.json
file. These are determined by the second header line from the templates and will differ by the data type.<value for ID, row N>
is the user-provided value for the input for a givenspec.json
ID and import or analysis instance, where an import/analysis instance is effectively a row in the data file. Each data file row is provided in order for each type. Each row is provided in a mapping ofspec.json
ID to the data for the row. Lines > 3 in the templates are user-provided data, and each line corresponds to a single import or analysis.
Error reponses are of the general form:
{
"errors": [
{"type": <error code string>,
... other fields depending on the error code ...
},
...
]
}
Existing error codes are currently:
cannot_find_file
if an input file cannot be foundcannot_parse_file
if an input file cannot be parsedincorrect_column_count
if the column count is not as expected- For Excel files, this may mean there is a non-empty cell outside the bounds of the data area
multiple_specifications_for_data_type
if more than one tab or file per data type is submittedno_files_provided
if no files were providedunexpected_error
if some other error occurs
The HTTP code returned will be, in order of precedence:
- 400 if any error other than
cannot_find_file
orunexpected_error
occurs - 404 if at least one error is
cannot_find_file
but there are no 400-type errors - 500 if all errors are
unexpected_error
The per error type data structures are:
{
"type": "cannot_find_file",
"file": <filepath>
}
{
"type": "cannot_parse_file",
"file": <filepath>,
"tab": <spreadsheet tab if applicable, else null>,
"message": <message regarding the parse error>
}
{
"type": "incorrect_column_count",
"file": <filepath>,
"tab": <spreadsheet tab if applicable, else null>,
"message": <message regarding the error>
}
{
"type": "multiple_specifications_for_data_type",
"file_1": <filepath for first file>,
"tab_1": <spreadsheet tab from first file if applicable, else null>,
"file_2": <filepath for second file>,
"tab_2": <spreadsheet tab for second file if applicable, else null>,
"message": <message regarding the multiple specification error>
}
{
"type": "no_files_provided"
}
{
"type": "unexpected_error",
"file": <filepath if applicable to a single file>
"message": <message regarding the error>
}
This endpoint is the reverse of the parse bulk specifications endpoint - it takes a similar data structure to that which the parse endpoint returns and writes bulk specification templates.
URL : ci.kbase.us/services/staging_service/write_bulk_specification
local URL : localhost:3000/write_bulk_specification
Method : POST
Headers :
Authorization: <Valid Auth token>
Content-Type: Application/JSON
Code : 200 OK
Content example
POST write_bulk_specification/
{
"output_directory": <staging area directory in which to write output files>,
"output_file_type": <one of "CSV", "TSV", or "EXCEL">,
"types": {
<type 1>: {
"order_and_display: [
[<spec.json ID 1>, <display.yml name 1>],
[<spec.json ID 2>, <display.yml name 2>],
...
],
"data": [
{<spec.json ID 1>: <value for ID, row 1>, <spec.json ID 2>: <value for ID, row 1>, ...},
{<spec.json ID 1>: <value for ID, row 2>, <spec.json ID 2>: <value for ID, row 2>, ...}
...
]
},
<type 2>: {
"order_and_display: [
[<spec.json ID 1>, <display.yml name 1>],
...
],
"data": [
{<spec.json ID 1>: <value for ID, row 1>, <spec.json ID 2>: <value for ID, row 1>, ...},
...
]
},
...
}
}
output_directory
specifies where the output files should be written in the user's staging area.output_file_type
specifies the format of the output files.<type N>
is a data type ID from the Mappings.py file and the Narrative staging area configuration file - it is a shared namespace between the staging service and Narrative to specify bulk applications, and has a 1:1 mapping to an app. It is included in the first header line in the templates.order_and_display
determines the ordering of the columns in the written templates, as well as mapping the spec.json ID of the parameter to the human readable name of the parameter in the display.yml file.<spec.json ID N>
is the ID of an input parameter from aKB-SDK
app'sspec.json
file. These are written to the second header line from the import templates and will differ by the data type.data
contains any data to be written to the file as example data, and is analagous to the data structure returned from the parse endpoint. To specify that no data should be written to the template provide an empty list.<value for ID, row N>
is the value for the input for a givenspec.json
ID and import or analysis instance, where an import/analysis instance is effectively a row in the data file. Each data file row is provided in order for each type. Each row is provided in a mapping ofspec.json
ID to the data for the row. Lines > 3 in the templates are user-provided data, and each line corresponds to a single import or analysis.
Reponse:
{
"output_file_type": <one of "CSV", "TSV", or "EXCEL">,
"files": {
<type 1>: <staging service path to file containg data for type 1>,
...
<type N>: <staging service path to file containg data for type N>,
}
}
output_file_type
has the same definition as above.files
contains a mapping of each provided data type to the output template file for that type. In the case of Excel, all the file paths will be the same since the data types are all written to different tabs in the same file.
Method specific errors have the form:
{"error": <error message>}
The error code in this case will be a 4XX error.
The AioHTTP server may also return built in errors that are not in JSON format - an example of this is overly large (> 1MB) request bodies.
This endpoint returns:
- a mapping between a list of files and predicted importer apps, and
- a file information list that includes the input file names split between the file prefix and
the file suffix, if any, that was used to determine the file -> importer mapping, and a list
of file types based on the file suffix. If a file has a suffix that does not match
any mapping (e.g.
.sys
), the suffix will benull
, the prefix the entire file name, and the file type list empty.
For example,
- if we pass in nothing we get a response with no mappings
- if we pass in a list of files, such as ["file1.fasta", "file2.fq", "None"], we would get back a response that maps to Fasta Importers and FastQ Importers, with a weight of 0 to 1 which represents the probability that this is the correct importer for you.
- for files for which there is no predicted app, the return is a null value
- this endpoint is used to power the dropdowns for the staging service window in the Narrative
URL : ci.kbase.us/services/staging_service/importer_mappings
local URL : localhost:3000/importer_mappings
Method : POST
Headers : Not Required
Code : 200 OK
Content example
data = {"file_list": ["file1.txt", "file2.zip", "file3.gff3.gz"]}
async with AppClient(config, username) as cli:
resp = await cli.post(
"importer_mappings/", data=data
)
Response:
{
"mappings": [
null,
[{
"id": "decompress",
"title": "decompress/unpack",
"app_weight": 1,
}],
[{
"app_weight": 1,
"id": "gff_genome",
"title": "GFF/FASTA Genome",
},
{
"app_weight": 1,
"id": "gff_metagenome",
"title": "GFF/FASTA MetaGenome",
}]
],
"fileinfo": [
{"prefix": "file1.txt", "suffix": null, "file_ext_type": []},
{"prefix": "file2", "suffix": "zip", "file_ext_type": ["CompressedFileFormatArchive"]},
{"prefix": "file3", "suffix": "gff3.gz", "file_ext_type": ["GFF"]}
]
}
Code : 400 Bad Request
Content
must provide file_list field
This endpoint returns information about the file types associated with data types and the file extensions for those file types. It is primarily of use for creating UI elements describing which file extensions may be selected when performing bulk file selections.
URL : ci.kbase.us/services/staging_service/importer_filetypes
local URL : localhost:3000/importer_filetypes
Method : GET
Headers : Not Required
Code : 200 OK
Content example
GET importer_filetypes/
Response:
{
"datatype_to_filetype": {
<type 1>: [<file type 1>, ... <file type N>],
...
<type M>: [<file type 1>, ... <file type N>],
},
"filetype_to_extensions": {
<file type 1>: [<extension 1>, ..., <extension N>],
...
<file type M>: [<extension 1>, ..., <extension N>],
}
}
<type N>
is a data type ID from the Mappings.py file and the Narrative staging area configuration file - it is a shared namespace between the staging service and Narrative to specify bulk applications, and has a 1:1 mapping to an import app. It is included in the first header line in the templates.<file type N>
is a file type likeFASTA
orGENBANK
. The supported file types are listed below.<extension N>
is a file extension like*.fa
or*.gbk
.
These are the currently supported upload app type IDs:
fastq_reads_interleaved
fastq_reads_noninterleaved
sra_reads
genbank_genome
gff_genome
gff_metagenome
expression_matrix
media
fba_model
assembly
phenotype_set
sample_set
metabolic_annotation
metabolic_annotation_bulk
escher_map
decompress
Note that decompress is only returned when no other file type can be detected from the file extension.
These are the currently supported file type IDs. These are primarily useful for apps that take two different file types, like GFF/FASTA genomes.
FASTA
FASTQ
SRA
GFF
GENBANK
SBML
JSON
TSV
CSV
EXCEL
CompressedFileFormatArchive