Skip to content

BiocPy/SummarizedExperiment

Folders and files

NameName
Last commit message
Last commit date
Mar 20, 2025
Jun 14, 2024
Mar 27, 2025
Mar 26, 2025
Jun 15, 2022
Jun 15, 2022
Mar 27, 2025
Jun 15, 2022
Aug 21, 2023
Mar 26, 2025
Aug 21, 2023
Nov 22, 2022
Jan 2, 2025
Dec 20, 2024
Jan 6, 2025
Feb 13, 2024
Jun 15, 2022

Repository files navigation

Project generated with PyScaffold PyPI-Server Unit tests

SummarizedExperiment

This package provides containers to represent genomic experimental data as 2-dimensional matrices, follows Bioconductor's SummarizedExperiment. In these matrices, the rows typically denote features or genomic regions of interest, while columns represent samples or cells.

The package currently includes representations for both SummarizedExperiment and RangedSummarizedExperiment. A distinction lies in the fact RangedSummarizedExperiment object provides an additional slot to store genomic regions for each feature and is expected to be GenomicRanges (more here).

Install

To get started, Install the package from PyPI,

pip install summarizedexperiment

Usage

A SummarizedExperiment contains three key attributes,

  • assays: A dictionary of matrices with assay names as keys, e.g. counts, logcounts etc.
  • row_data: Feature information e.g. genes, transcripts, exons, etc.
  • column_data: Sample information about the columns of the matrices.

First lets mock feature and sample data:

from random import random
import pandas as pd
import numpy as np
from biocframe import BiocFrame

nrows = 200
ncols = 6
counts = np.random.rand(nrows, ncols)
row_data = BiocFrame(
    {
        "seqnames": [
            "chr1",
            "chr2",
            "chr2",
            "chr2",
            "chr1",
            "chr1",
            "chr3",
            "chr3",
            "chr3",
            "chr3",
        ]
        * 20,
        "starts": range(100, 300),
        "ends": range(110, 310),
        "strand": ["-", "+", "+", "*", "*", "+", "+", "+", "-", "-"] * 20,
        "score": range(0, 200),
        "GC": [random() for _ in range(10)] * 20,
    }
)

col_data = pd.DataFrame(
    {
        "treatment": ["ChIP", "Input"] * 3,
    }
)

To create a SummarizedExperiment,

from summarizedexperiment import SummarizedExperiment

tse = SummarizedExperiment(
    assays={"counts": counts}, row_data=row_data, column_data=col_data,
    metadata={"seq_platform": "Illumina NovaSeq 6000"},
)
## output
class: SummarizedExperiment
dimensions: (200, 6)
assays(1): ['counts']
row_data columns(6): ['seqnames', 'starts', 'ends', 'strand', 'score', 'GC']
row_names(0):
column_data columns(1): ['treatment']
column_names(0):
metadata(1): seq_platform

To create a RangedSummarizedExperiment

from summarizedexperiment import RangedSummarizedExperiment
from genomicranges import GenomicRanges

trse = RangedSummarizedExperiment(
    assays={"counts": counts}, row_data=row_data,
    row_ranges=GenomicRanges.from_pandas(row_data.to_pandas()), column_data=col_data
)
## output
class: RangedSummarizedExperiment
dimensions: (200, 6)
assays(1): ['counts']
row_data columns(6): ['seqnames', 'starts', 'ends', 'strand', 'score', 'GC']
row_names(0):
column_data columns(1): ['treatment']
column_names(0):
metadata(0):

For more examples, checkout the documentation.

Note

This project has been set up using PyScaffold 4.5. For details and usage information on PyScaffold see https://pyscaffold.org/.