Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update studios and launch form content #6

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion demo/docs/002_what_is_the_seqera_platform.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,6 @@ Seqera offers two deployment methods:

## Core components

The Platform consists of three main architectural components: a backend container, a frontend container, and a database that stores all of the data required by the application. The frontend container communicates with the backend container and database via API calls. As a result, all features and activities available through the user interface can also be accessed programmatically via the Seqera Platform API. For more information, see the [Automation](./014_automation_on_the_seqera_platform.md) section later in the walkthrough.
The Platform consists of three main architectural components: a backend container, a frontend container, and a database that stores all of the data required by the application. The frontend container communicates with the backend container and database via API calls. As a result, all features and activities available through the user interface can also be accessed programmatically via the Seqera Platform API. For more information, see the [Automation](./015_automation_on_the_seqera_platform.md) section later in the walkthrough.

This walkthrough will demonstrate the various features of the Seqera Platform which makes it easier to build, launch, and manage scalable data pipelines.
85 changes: 46 additions & 39 deletions demo/docs/004_launching_pipelines.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,9 @@ Each workspace has a Launchpad that allows users to easily create and share Next

Users can create their own pipelines, share them with others on the Launchpad, or tap into over a hundred community pipelines available on nf-core and other sources.


/// details | Advanced
type: info
type: info

Adding a new pipeline is relatively simple and can be included as part of the demonstration. See [Add a Pipeline](./005_adding_a_pipeline.md).
///

Expand All @@ -17,66 +16,74 @@ Adding a new pipeline is relatively simple and can be included as part of the de

Navigate to the Launchpad in the `seqeralabs/showcase` workspace and select **Launch** next to the `nf-core-rnaseq` pipeline to open the launch form.


/// details | Click to show animation
type: example
type: example

![Launch a pipeline](assets/sp-cloud-launch-form.gif)
![Launch a pipeline](assets/sp-cloud-launch-form.gif)
///

The launch form consists of General config, Run parameters, and Advanced options sections to specify your run parameters before execution, and an execution summary.

### 2. Nextflow parameter schema
### 2. Launch form: General config

When you select **Launch**, a parameters page is shown to allow you to fine-tune the pipeline execution. This parameters form is rendered from a file called [`nextflow_schema.json`](https://github.com/nf-core/rnaseq/blob/master/nextflow_schema.json) which can be found in the root of the pipeline Git repository. The `nextflow_schema.json` file is a simple JSON-based schema describing pipeline parameters that allows pipeline developers to easily adapt their in-house Nextflow pipelines to be executed via the interactive Seqera Platform web interface.
The General config section contains the following fields:

See the ["Best Practices for Deploying Pipelines with the Seqera Platform"](https://seqera.io/blog/best-practices-for-deploying-pipelines-with-seqera-platform/) blog for further information on how to automatically build the parameter schema for any Nextflow pipeline using tooling maintained by the nf-core community.
- **Pipeline to launch**: The pipeline Git repository name or URL.
- **Revision number**: A valid repository commit ID, tag, or branch name. For nf-core/rnaseq, this is prefilled.
- **Config profiles**: One or more configuration profile names to use for the execution. This pipeline will use the `test` profile.
- **Workflow run name**: A unique identifier for the run, initially generated as a combination of an adjective and a scientist's name, but can be modified as needed.
- **Labels**: Assign new labels to the run in addition to `yeast`.
- **Compute environment**: Select an existing workspace compute environment. This pipeline will use the `seqera_aws_ireland_fusionv2_nvme` compute environment.
- **Work directory**: The (cloud or local) file storage path where pipeline scratch data is stored. Platform will create a scratch sub-folder if only a cloud bucket location is specified. This pipeline will use the `s3://seqeralabs-showcase` bucket.

### 3. Parameter selection
### 3. Launch form: Run parameters

Adjust the following Platform-specific options if needed:
After specifying the General config, the Run parameters page appears, allowing you to fine-tune pipeline execution. This form is generated from the pipeline's `nextflow_schema.json` file, which defines pipeline parameters in a simple JSON-based schema. This schema enables pipeline developers to easily adapt their Nextflow pipelines for execution via the Seqera Platform web interface.

- `Workflow run name`:
For more information on automatically building the parameter schema for any Nextflow pipeline, refer to the ["Best Practices for Deploying Pipelines with the Seqera Platform"](https://seqera.io/blog/best-practices-for-deploying-pipelines-with-seqera-platform/) blog.

A unique identifier for the run, pre-filled with a random name. This can be customized.
You can enter Run parameters in three ways:

- `Labels`:
- **Input form view**: Enter text or select attributes from lists, and browse input and output locations with Data Explorer.
- **Config view**: Edit raw configuration text directly in JSON or YAML format.
- **Upload params file**: Upload a JSON or YAML file containing run parameters.

Assign new or existing labels to the run. For example, Project ID or genome version.
Specify the following parameters for nf-core/rnaseq:

Each pipeline including nf-core/rnaseq will have its own set of parameters that need to be provided in order to run it. The following parameters are mandatory:
- `input`: Most nf-core pipelines have standardized the usage of the `input` parameter to specify an input samplesheet that contains paths to any input files (such as FastQ files) and any additional metadata required to run the pipeline. The `input` parameter can accept a file path to a samplesheet in the S3 bucket selected through Data Explorer (such as `s3://my-bucket/my-samplesheet.csv`). Alternatively, the Seqera Platform has a Datasets feature that allows you to upload structured data like samplesheets for use with Nextflow pipelines. For the purposes of this demonstration, select **Browse** next to the `input` parameter and search and select a pre-loaded dataset called "rnaseq_samples".

/// details | Click to show animation
type: example

- `input`:
![Input parameters](assets/sp-cloud-launch-parameters-input.gif)
///

Most nf-core pipelines have standardized the usage of the `input` parameter to specify an input samplesheet that contains paths to any input files (such as FastQ files) and any additional metadata required to run the pipeline. The `input` parameter can accept a file path to a samplesheet in the S3 bucket selected through Data Explorer (such as `s3://my-bucket/my-samplesheet.csv`). Alternatively, the Seqera Platform has a Datasets feature that allows you to upload structured data like samplesheets for use with Nextflow pipelines.
/// details | Advanced
type: info

For the purposes of this demonstration, select **Browse** next to the `input` parameter and search and select a pre-loaded dataset called "rnaseq_samples".
Users can upload their own samplesheets and make them available as a dataset in the 'Datasets' tab. See [Add a dataset](./006_adding_a_dataset.md).
///

/// details | Click to show animation
type: example
- `outdir`: Most nf-core pipelines have standardized the usage of the `outdir` parameter to specify where the final results created by the pipeline are published. `outdir` must be different for each different pipeline run. Otherwise, your results will be overwritten. Since we want to publish these files to an S3 bucket, we must provide the directory path to the appropriate storage location (such as `s3://my-bucket/my-results).

![Input parameters](assets/sp-cloud-launch-parameters-input.gif)
///

For the `outdir` parameter, specify an S3 directory path manually, or select **Browse** to specify a cloud storage directory using Data Explorer.

/// details | Advanced
type: info

Users can upload their own samplesheets and make them available as a dataset in the 'Datasets' tab. See [Add a dataset](./006_adding_a_dataset.md).
///
/// details | Click to show animation
type: example

- `outdir`:
![Output parameters](assets/sp-cloud-run-parameters.gif)
///

Most nf-core pipelines have standardized the usage of the `outdir` parameter to specify where the final results created by the pipeline are published. `outdir` must be different for each different pipeline run. Otherwise, your results will be overwritten. Since we want to publish these files to an S3 bucket, we must provide the directory path to the appropriate storage location (such as `s3://my-bucket/my-results).
Users can easily modify and specify other parameters to customize the pipeline execution through the parameters form. For example, in the **Read trimming options** section of the parameters page, change the `trimmer` to select `fastp` in the dropdown menu, instead of `trimgalore`.

For the `outdir` parameter, specify an S3 directory path manually, or select **Browse** to specify a cloud storage directory using Data Explorer.
![Read trimming options](./assets/trimmer-settings.png)

### 4. Launch form: Advanced options

/// details | Click to show animation
type: example

![Output parameters](assets/sp-cloud-launch-parameters-outdir.gif)
///
The Advanced options allow you to specify additional settings for the pipeline execution. These include:

Users can easily modify and specify other parameters to customize the pipeline execution through the parameters form. For example, in the **Read trimming options** section of the parameters page, change the `trimmer` to select `fastp` in the dropdown menu, instead of `trimgalore`, and select **Launch** button.
- **Resource labels**: Use resource labels to tag the computing resources created during the workflow execution.
- **Nextflow config**: Specify Nextflow configuration options to customize task execution. For example, you can specify an error handling strategy to continue the workflow even if some tasks fail.
- **Pipeline secrets**: Pipeline secrets store keys and tokens used by workflow tasks to interact with external systems. Enter the names of any stored user or workspace secrets required for the workflow execution.

![Read trimming options](./assets/trimmer-settings.png)
After you have filled the necessary launch details, select **Launch**.
Original file line number Diff line number Diff line change
@@ -1,22 +1,36 @@
## Introduction to Data Studios
After running a pipeline, you may want to perform tertiary analysis in platforms like Jupyter Notebook or RStudio. Setting up the infrastructure for these platforms, including accessing pipeline data, results, and necessary bioinformatics packages, can be complex and time-consuming.
After running a pipeline, you may want to perform interactive analysis in platforms like Jupyter Notebook or RStudio using your preferred tools. Setting up the infrastructure for these platforms, including accessing pipeline data, results, and necessary bioinformatics packages, can be complex and time-consuming.

Data Studios streamlines this process for Seqera Platform users by allowing them to add interactive analysis environments based on templates, similar to how they add and share pipelines and datasets.
Data Studios simplifies this process for Seqera Platform users by enabling them to create interactive analysis environments using container image templates or custom images, much like the way they add and share pipelines and datasets.

Platform manages all the details, enabling users to easily select their preferred interactive tool and analyze their data within the platform.

On the **Data Studios** tab, you can monitor and see the details of the data studios in your workspace.

Data studios will have a name, followed by the cloud provider they are run on, the container image being used (Jupyter, VS Code, or RStudio), the user who created the data studio, the timestamp of creation, and the status indicating whether it has started, stopped, or is running.
Data studios will have a name, followed by the cloud provider they are run on, the container image being used (Jupyter, VS Code, RStudio or custom container), the user who created the data studio, the timestamp of creation, and the [status of the session](https://docs.seqera.io/platform/24.2/data_studios#session-statuses).

![Data studios overview](./assets/data-studios-overview.png)

Select the three dots menu to:
- See the details of the data studio
- Connect to the studio
- View the details of the data studio
- Start the studio
- Stop the studio
- Start the studio as a new sessions
- Copy the data studio URL
- Stop the studio

### Environments
Data Studios offers four container image templates: JupyterLab, RStudio Server, Visual Studio Code, and Xpra. These templates initially install a minimal set of packages, allowing you to add more as needed during a session. Customized studios display an arrow icon with a tooltip indicating the modified template.

In addition to the Seqera-provided container template images, you can provide your own custom container environments by augmenting the Seqera-provided images with a list of Conda packages or by providing your own base container template image.

Data Studios uses the Wave service to build custom container template images.

/// details | Click to show animation
type: example

![Data Studio overview details](assets/sp-cloud-data-studios-overview.gif)
///


## Analyse RNAseq data in Data Studios

Expand Down Expand Up @@ -118,34 +132,8 @@ To share the results of your RNAseq analysis or allow colleagues to perform expl
### 5. Takeaway
This example demonstrates how Data Studios allows you to perform interactive analysis and explore the results of your secondary data analysis all within one unified platform. It simplifies the setup and data management process, making it easier for you to gain insights from your data efficiently.

## Analysing genomic data using IGV desktop in Data Studios

We can use the Xpra data studio image to visualize genetic variants using IGV desktop. The stock Xpra image does not come with IGV preinstalled, so we will need to install it and then use IGV to visualize a variant in the 1000 Genomes Project.

### 1. Open the Xpra data studio
Select the existing **xpra-demo** data studio.

When you click on "Start" you will see that the data studio is mounting the `xpra-1000G` bucket. This is the 1000 Genomes public bucket `
s3://1000genomes`, but we have created a second data link inside the workspace to not block or collide with the data link titled `1000 genomes`.


### 2. Upload IGV install script and copy to `/workspace`
To make it easier to get IGV and it's requirements installed efficiently, we created a small script that will download the IGV prebuild binaries for Linux and install them inside the data studio. To use the install script, upload `download_and_install_igv.sh` by clicking the top left navbar, select Server -> Upload file. This will upload the file to `/root`. Let's copy it to `/workspace` with the `cp` command: `cp /root/download_and_install_igv.sh /workspace`.

![Xpra upload file](assets/xpra-data-studios-upload-file.png)

### 3. Install IGV
Run the script with `bash`: `bash /workspace/download_and_install_igv.sh`. This will download and install IGV desktop and open it. You should see the IGV desktop window open if everything worked correctly.

![Xpra IGV desktop](assets/xpra-data-studios-IGV-desktop.png)


### 4. View 1000 Genomes Project data in IGV
Inside IGV desktop, change the genome version to hg19. Then click on File -> Load from File and select the following file as shown in the screenshot.
`/workspace/data/xpra-1000Genomes/phase3/data/HG00096/high_coverage_alignment`.
![Xpra IGV desktop](assets/xpra-data-studios-IGV-load-bam.png)
For more examples of using Data Studios for interactive analysis, see the following guides:

- [Analysing genomic data using IGV desktop in Data Studios](013_create_xpra_igv_environment.md)

Search for PCSK9 and zoom into one of the exons of the gene. If you are on genome version hg19 and everything worked as expected, you should be able to see a coverage graph and reads as shown in the screenshot below:

![Xpra IGV desktop](assets/xpra-data-studios-IGV-view-bam.png)
Loading