-
Why the command-line?
- Big data files
- open partial files
- piping
- Unix/Linux pratices
- permissions/file sharing
- tools, do one thing well
- terminal - networking, speed, compute
- open source community
- Available tools
- Common tools used at NVSL
- Fast - low overhead, no GUI
- Fast - scripting
- File system access
- Wildcard'ing, regular expressions
- Shortcuts
- Automate repetition
- Creativity
- Big data files
-
Resources
-
Man pages
-
Websites
- AI: ChatGPT, Claude, Perplexity
- StackOverFlow
- Biostar
-
Books
-
Workshops
-
-
Unix Standard tools
- cd
- cp
- mkdir
- echo
- ls
- pwd
- head
- grep
- sed
- pigz
- mv (danger)
- rm (danger)
Other source for installing tools:
- Source
- Dependency libraries needed
- Compiled manually: config, make, make install
- GitHub
- Container
- Docker
- Singularity/Apptainer
-
space
- cd mydir
- cdmydir
-
tabs
- set amount of spaces
- \t
- Unix: \n
- Windows: \r
- tab key for auto completion is faster but most importantly ensures correct spelling
- cd
- ls
- pwd
- mkdir
- cp
- mv
- Ownership: chown
- User - Group - All
- Permission: chmod
- Read - Write - Execute: 4 - 2 - 1
- Where a Linux system will look for a program
- You need to know your PATH variable locations
- Anaconda packages are automatically added to your PATH
- Copying files to your local machine.
- WinSCP/FileZilla
- Web Interface - OnDemand
- Mounted drive
- scp command-line
- VS Code
- Always have known, good datasets.
- When using a script for the first time if there is a failure don't assume it can only be the script.
- Terminal - emulators - end points
- The shell - program running in the terminal - command line interpreter - running other programs
- ssh, secure shell standard network protocol
- ssh keys
- .ssh/authorized_keys
- .ssh/known_host
- User's profile
- ~/.bash_profile
- ~/.zshrc
- User's PATH variable
- Tab completion
- Space character
- command mistakes
ls-lh
cd~/myfolder
cd..
- command mistakes
- Standard tools/GNU coreutils
- Options
- dash "-"
- flags
- Wildcards
- *: Matches any sequence of characters
- ?: Matches any single character.
- [...]: Matches any one of the characters inside the brackets
- [!...]: Matches any character that is not inside the brackets
- Absolute versus relative paths
- absolute: root /
- ~/ ${HOME}
- relative: from working directory
- Command History, ctrl-r, .zsh_history
- Logic
Make directory
Copy files from repo's "data" folder into new directory
cd <cloned repo>
mkdir test_dir
cp data/* test_dir; cd test_dir; ls
which pigz
conda install pigz
pigz -d *gz
grep -c '^@SRR' *fastq
head *_R1*fastq
head -4 *fastq | grep '^@SRR' | sed 's/ .*//'
for i in *fastq; do head -4 $i | grep '^@SRR' | sed 's/ .*//'; done
for i in *fastq; do count=$(grep -c '^@SRR' $i); printf "$i has this $count reads\n"; done
for file in *; do
if [[ $file == *.fasta ]]; then
echo "$file is a FASTA"
fi
done
echo "PDF samples in list not found in directory:"; while read filename; do name=$(ls ${filename}*.pdf); [ ! -e "$name" ] && echo "$filename"; done < list 2> /dev/null
echo "PDF samples in list found in directory:"; while read filename; do name=$(ls ${filename}*.pdf); [ -e "$name" ] && echo "$filename"; done < list 2> /dev/null
pwd; dir=`pwd`; cd~; cd $dir; pwd # find the error
for *_R1*; echo $i; done # find the error
for i in *fastq; do count=$(wc -l | sed 's/ .*//'); printf "$i has $count lines\n"; done # find the error
for i in *_R1*fastq; count=`grep -c 'GTGTAA' $i`; echo "$count in $i"; done # find the error
#find error
for file in *; do
if [[ $file == SRR* ]]; then
echo "-> \t\t$file starts with SRR"
elif [[ $file == ERR* ]]; then
echo "----> \t$file starts with ERR"
else
echo "$file - starts with something else"
if
done
conda activate <env>
conda env list
amrfinder --nucleotide SRR17276215_amr.fasta --output SRR17276215_output.txt
conda create -n sra-tools -c conda-forge -c bioconda -n sra-tools
fasterq-dump --split-files -O . SRR26282520
wget https://sra-pub-run-odp.s3.amazonaws.com/sra/SRR6046640/SRR6046640
fastq-dump --split-files SRR6046640
~/sratoolkit.3.0.7-mac64/bin/fasterq-dump -S SRR6046640
Download Docker. It must be running.
docker pull ncbi/sra-tools
docker run -t --rm -v $PWD:/output:rw -w /output ncbi/sra-tools fasterq-dump -e 2 -p SRR6046640
singularity pull docker://ncbi/sra-tools
singularity run sra-tools_latest.sif fasterq-dump -e 2 -p SRR6046640
~/anaconda3/envs/vsnp3/bin/vsnp3_kraken2_wrapper.py -r1 SRR6046640_R1.fastq.gz -r2 SRR6046640_R2.fastq.gz --database ~/k2_standard_08gb
Download test data
cd ~; git clone https://github.com/USDA-VS/vsnp3_test_dataset.git
Step 1
vsnp3_step1.py -r1 *_R1*.fastq.gz -r2 *_R2*.fastq.gz -t Mycobacterium_AF2122
Step 2
vsnp3_step2.py -a -t Mycobacterium_AF2122