Skip to content

Day II: Friday 22.03.2024

Abdoallah Sharaf edited this page Apr 1, 2024 · 25 revisions

© 2024 Abdoallah Sharaf

Querying genome metadata and sequencing projects using GoaT

This session will give you a hands-on opportunity to learn how the Genomes on a Tree (GoaT) database is used to extract genomic information, including estimates for a wide range of genome features (e.g. chromosome number and genome size), as well as sequencing status of genome sequencing projects globally.

By the end of this session you will:

  • Have a general understanding on what GoaT is and know what genome-relevant metadata (such as genome size, chromosome number, assembly span) are stored on GoaT.
  • Have experience performing simple and complex queries using GoaT Web Interface to retrieve list of taxa and/or assemblies
  • Understand the different indexes on GoaT (taxon and assembly) and how to query associated metadata
  • Have experience viewing sequencing status and target lists for projects in the Earth Biogenome Project Network.

Material

In this session you will be following along a live demo of GoaT web, using the set of slides provided by the instructor.

Prerequisites

  • Access to web via desktop (GoaT UI is not optimised for mobile).
  • Basic knowledge of genome characteristics and descriptors (assembly, assembly span, genome size, chromosome count, etc.).
  • Interest in Eukaryotic genome sequencing initiatives.

Working with Docker images

  • Computational workflows are rarely composed of a single script or tool. More often, they depend on dozens of software components or libraries.Installing and maintaining such dependencies is a challenging task and a common source of irreproducibility in scientific applications. To overcome these issues, you can use a container technology that enables software dependencies. These container images can be easily deployed in any platform that supports the container runtime. Containers can be executed in an isolated manner from the hosting system. Having its own copy of the file system, processing space, and memory management. Docker is a handy management tool to build, run and share container images. These container images can be uploaded and published in a centralized repository known as Docker Hub, or hosted by other parties, such as Quay. Good tutorial materials are available here. but i will just demonstrate few examples on the course Gitpod.

  • Run the publicly available hello-world container

docker run hello-world  
  • Let's try to pull and work with one of the publicly available containers with fastqc tool
docker pull staphb/fastqc
  • We can check if a container has been pulled using the images command.
docker images
  • Run a container
    • images can be run directly
    docker run staphb/fastqc fastqc -h
    • Also, images can be run interax=crtively
     docker run -it staphb/fastqc
    • in the interactive mode you should consider mounting the current directory
    docker run -v $PWD:$PWD -w $PWD -it staphb/fastqc

Exercise: Pull and run interactively he publicly available containers with trimmomatic software

Workflow management with Nextflow and introduction to nf-core

Chris Hakkaart will run his training based on THIS training