Skip to content

Commit

Permalink
docs: add unaligned nucleotide sequence to preprocessing
Browse files Browse the repository at this point in the history
  • Loading branch information
JonasKellerer committed Jan 25, 2024
1 parent adda1dc commit 0534523
Show file tree
Hide file tree
Showing 3 changed files with 28 additions and 214 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,8 @@ The SILO preprocessing accepts input data in two formats:

- `NDJSON`: a single [NDJSON](https://ndjson.org/) file containing all the data,
- `TSV/FASTA`: a directory containing
- a TSV file with the metadata
- FASTA files with the sequences
- a TSV file with the metadata
- FASTA files with the sequences

The preprocessing configuration file determines which format should be used.

Expand All @@ -34,18 +34,19 @@ You only need to specify `ndjsonInputFilename` or `pangoLineageDefinitionFilenam
if you wish to use the corresponding features.
:::

| Key | Input Format | Default | Default in Docker Image |
| -------------------------------- | ------------ | -------------------------------- | ------------------------ |
| `inputDirectory` | both | `./` (current working directory) | `/preprocessing/input/` |
| `outputDirectory` | both | `./output/` | `/preprocessing/output/` |
| `intermediateResultsDirectory` | both | `./temp/` | `/preprocessing/temp/` |
| `preprocessingDatabaseLocation` | both | (absent) | |
| `ndjsonInputFilename` | `NDJSON` | (absent) | |
| `metadataFilename` | `TSV/FASTA` | `metadata.tsv` | |
| `pangoLineageDefinitionFilename` | both | (absent) | |
| `referenceGenomeFilename` | both | `reference_genomes.json` | |
| `nucleotideSequencePrefix` | `TSV/FASTA` | `nuc_` | |
| `genePrefix` | `TSV/FASTA` | `gene_` | |
| Key | Input Format | Default | Default in Docker Image |
| ----------------------------------- | ------------ | -------------------------------- | ------------------------ |
| `inputDirectory` | both | `./` (current working directory) | `/preprocessing/input/` |
| `outputDirectory` | both | `./output/` | `/preprocessing/output/` |
| `intermediateResultsDirectory` | both | `./temp/` | `/preprocessing/temp/` |
| `preprocessingDatabaseLocation` | both | (absent) | |
| `ndjsonInputFilename` | `NDJSON` | (absent) | |
| `metadataFilename` | `TSV/FASTA` | `metadata.tsv` | |
| `pangoLineageDefinitionFilename` | both | (absent) | |
| `referenceGenomeFilename` | both | `reference_genomes.json` | |
| `nucleotideSequencePrefix` | `TSV/FASTA` | `nuc_` | |
| `genePrefix` | `TSV/FASTA` | `gene_` | |
| `unalignedNucleotideSequencePrefix` | `TSV/FASTA` | `unaligned_` | |

:::note
All filenames are relative to the `inputDirectory`.
Expand Down
13 changes: 13 additions & 0 deletions siloLapisTests/test/unalignedNucleotideSequence.spec.ts
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,19 @@ describe('The /unalignedNucleotideSequence endpoint', () => {
expect(primaryKeys[0]).to.equal('>key_3259931');
});

it('should return the short sequence', async () => {
const result = await lapisSingleSegmentedSequenceController.postUnalignedNucleotideSequence({
nucleotideSequenceRequest: { primaryKey: 'key_1749899' },
});

const { primaryKeys, sequences } = sequenceData(result);

expect(primaryKeys).to.have.length(1);
expect(sequences).to.have.length(1);
expect(primaryKeys[0]).to.equal('>key_1749899');
expect(sequences[0]).to.equal('some_very_short_string');
});

it('should return the lapis data version in the response', async () => {
const result = await fetch(basePath + '/sample/unalignedNucleotideSequences');

Expand Down
200 changes: 0 additions & 200 deletions siloLapisTests/testData/unaligned_testSecondSequence.fasta

This file was deleted.

0 comments on commit 0534523

Please sign in to comment.