Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release version 2.3.0 #295

Closed
wants to merge 39 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
819cbb1
Update ensembl_release_versions.py
y9c Jul 30, 2022
1463444
fix naming
y9c Aug 18, 2022
6829d89
quick update
y9c Aug 18, 2022
2ee9eb2
quick update
y9c Aug 18, 2022
43f5e1c
quick update
y9c Aug 18, 2022
597057c
Merge branch 'openvax:master' into master
y9c Aug 18, 2022
be260ff
quick update
y9c Aug 18, 2022
acd2509
Merge branch 'openvax:master' into master
y9c Nov 17, 2023
0690b64
Merge branch 'openvax:master' into master
y9c Dec 28, 2023
86968b5
add species
y9c Dec 28, 2023
29c4cce
format and relase
y9c Dec 29, 2023
7c581a8
format and release
y9c Dec 29, 2023
3819ea2
add comand to list all available
y9c Dec 29, 2023
ed1de05
add comand to list all available
y9c Dec 29, 2023
1426100
add comand to list all available
y9c Dec 29, 2023
e03a213
fix bug
y9c Jan 2, 2024
c6915d1
support plants
y9c Jan 10, 2024
9b426eb
bump version 2.3.0
y9c Jan 10, 2024
bb6adf8
bump version 2.3.0
y9c Jan 10, 2024
aceabe0
fix bug in fasta name
y9c Jan 10, 2024
e77c9b1
support arabidopsis
y9c Jan 10, 2024
247f9cf
update check
y9c Jan 10, 2024
82b52bc
format code
y9c Jan 10, 2024
6b6d8db
update config
y9c Jan 10, 2024
3f78d05
quick update
y9c Jan 10, 2024
6d840b4
ensemblrelease suport
y9c Jan 10, 2024
9a0cf7f
ensemblrelease suport
y9c Jan 10, 2024
f5c537d
ensemblrelease suport
y9c Jan 10, 2024
9d1ce7c
ensemblrelease suport, fix bu
y9c Jan 10, 2024
a7e8b5b
ensemblrelease suport, fix bug
y9c Jan 10, 2024
9ead657
update more species
y9c Jan 10, 2024
41612a0
format code
y9c Jan 10, 2024
e586f28
fix gene name error
y9c Jan 10, 2024
b2d2f62
fix gene name error for soybean and some other species
y9c Jan 10, 2024
65b5d6d
fix gene name error for maize
y9c Jan 10, 2024
eec7115
suport mRNA type
y9c Jan 10, 2024
303ada4
suport mRNA type
y9c Jan 10, 2024
44acb09
quick update
y9c Jan 11, 2024
b492835
fix conflict
y9c Jan 11, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 18 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,9 @@
<img src="https://img.shields.io/pypi/v/pyensembl.svg?maxAge=1000" alt="PyPI" />
</a>

# PyEnsembl

PyEnsembl
=======
PyEnsembl is a Python interface to [Ensembl](http://www.ensembl.org) reference genome metadata such as exons and transcripts. PyEnsembl downloads [GTF](https://en.wikipedia.org/wiki/Gene_transfer_format) and [FASTA](https://en.wikipedia.org/wiki/FASTA_format) files from the [Ensembl FTP server](ftp://ftp.ensembl.org) and loads them into a local database. PyEnsembl can also work with custom reference data specified using user-supplied GTF and FASTA files.
PyEnsembl is a Python interface to [Ensembl](http://www.ensembl.org) reference genome metadata such as exons and transcripts. PyEnsembl downloads [GTF](https://en.wikipedia.org/wiki/Gene_transfer_format) and [FASTA](https://en.wikipedia.org/wiki/FASTA_format) files from the [Ensembl FTP server](ftp://ftp.ensembl.org) and loads them into a local database. PyEnsembl can also work with custom reference data specified using user-supplied GTF and FASTA files.

# Example Usage

Expand All @@ -25,7 +24,7 @@ data = EnsemblRelease(77)
gene_names = data.gene_names_at_locus(contig=6, position=29945884)

# get all exons associated with HLA-A
exon_ids = data.exon_ids_of_gene_name('HLA-A')
exon_ids = data.exon_ids_of_gene_name("HLA-A")
```

# Installation
Expand All @@ -52,6 +51,7 @@ Alternatively, you can create the `EnsemblRelease` object from inside a Python
process and call `ensembl_object.download()` followed by `ensembl_object.index()`.

## Cache Location

By default, PyEnsembl uses the platform-specific `Cache` folder
and caches the files into the `pyensembl` sub-directory.
You can override this default by setting the environment key `PYENSEMBL_CACHE_DIR`
Expand All @@ -66,11 +66,11 @@ or
```python
import os

os.environ['PYENSEMBL_CACHE_DIR'] = '/custom/cache/dir'
os.environ["PYENSEMBL_CACHE_DIR"] = "/custom/cache/dir"
# ... PyEnsembl API usage
```

# Usage tips
# Usage tips

## List installed genomes

Expand All @@ -80,31 +80,33 @@ pyensembl list

```python
from pyensembl.shell import collect_all_installed_ensembl_releases

collect_all_installed_ensembl_releases()
```

## Load genome quickly

```python
from pyensembl import EnsemblRelease

data = EnsemblRelease(
release=100,
species=find_species_by_name('drosophila_melanogaster'),
)
species=find_species_by_name("drosophila_melanogaster"),
)
```

## Data structure

### Gene object

```python
gene=data.gene_by_id(gene_id='FBgn0011747')
gene = data.gene_by_id(gene_id="FBgn0011747")
```

### Transcript object

```python
transcript=gene.transcripts[0]
transcript = gene.transcripts[0]
```

### Protein information
Expand All @@ -125,11 +127,12 @@ For example:

```python
from pyensembl import Genome

data = Genome(
reference_name='GRCh38',
annotation_name='my_genome_features',
reference_name="GRCh38",
annotation_name="my_genome_features",
# annotation_version=None,
gtf_path_or_url='/My/local/gtf/path_to_my_genome_features.gtf', # Path or URL of GTF file
gtf_path_or_url="/My/local/gtf/path_to_my_genome_features.gtf", # Path or URL of GTF file
# transcript_fasta_paths_or_urls=None, # List of paths or URLs of FASTA files containing transcript sequences
# protein_fasta_paths_or_urls=None, # List of paths or URLs of FASTA files containing protein sequences
# cache_directory_path=None, # Where to place downloaded and cached files for this genome
Expand All @@ -142,8 +145,8 @@ gene_names = data.gene_names_at_locus(contig=6, position=29945884)
# API

The `EnsemblRelease` object has methods to let you access all possible
combinations of the annotation features *gene\_name*, *gene\_id*,
*transcript\_name*, *transcript\_id*, *exon\_id* as well as the location of
combinations of the annotation features _gene_name_, _gene_id_,
_transcript_name_, _transcript_id_, _exon_id_ as well as the location of
these genomic elements (contig, start position, end position, strand).

## Genes
Expand Down
Loading