Skip to content
This repository has been archived by the owner on Aug 26, 2023. It is now read-only.

Commit

Permalink
PDB download and database functions (#483)
Browse files Browse the repository at this point in the history
  • Loading branch information
Joel S authored and jgreener64 committed Aug 16, 2017
1 parent e08f2a3 commit 1c89218
Show file tree
Hide file tree
Showing 3 changed files with 649 additions and 29 deletions.
194 changes: 177 additions & 17 deletions docs/src/man/structure.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,18 +15,19 @@ end
The `Bio.Structure` module provides functionality to manipulate macromolecular structures, and in particular to read and write [Protein Data Bank](http://www.rcsb.org/pdb/home/home.do) (PDB) files. It is designed to be used for standard structural analysis tasks, as well as acting as a platform on which others can build to create more specific tools. It compares favourably in terms of performance to other PDB parsers - see some [benchmarks](https://github.com/jgreener64/pdb-benchmarks).


## Parsing PDB files
## Basics

To download a PDB file:

```julia
# Stored in the current working directory by default
downloadpdb("1EN2")
```

To parse a PDB file into a Structure-Model-Chain-Residue-Atom framework:

```julia
julia> struc = read(filepath_1EN2, PDB)
julia> struc = read("/path/to/pdb/file.pdb", PDB)
Bio.Structure.ProteinStructure
Name - 1EN2.pdb
Number of models - 1
Expand All @@ -40,6 +41,8 @@ Number of hydrogens - 0
Number of disordered atoms - 27
```

**Note** : Refer to [Downloading PDB files](#downloading-pdb-files) and [Reading PDB files](#reading-pdb-files) sections for more options.

The elements of `struc` can be accessed as follows:

| Command | Returns | Return type |
Expand Down Expand Up @@ -194,21 +197,6 @@ RCGSQGGGSTCPGLRCCSIWGWCGDSEPYCGRTCENKCWSGERSDHRCGAAVGNPPCGQDRCCSVHGWCGGGNDYCSGGN
```


## Writing PDB files

PDB format files can be written:

```julia
writepdb("1EN2_out.pdb", struc)
```

Any element type can be given as input to `writepdb`. Atom selectors can also be given as additional arguments:

```julia
writepdb("1EN2_out.pdb", struc, backboneselector)
```


## Spatial calculations

Various functions are provided to calculate spatial quantities for proteins:
Expand Down Expand Up @@ -244,6 +232,178 @@ julia> rad2deg(psiangle(struc['A'][50], struc['A'][51]))
```


## Downloading PDB files

To download a PDB file to a specify directory:

```julia
downloadpdb("1EN2", pdb_dir="path/to/pdb/directory/")
```

To download multiple PDB files to a specify directory:

```julia
downloadpdb(["1EN2","1ALW","1AKE"], pdb_dir="path/to/pdb/directory/")
```

To download a PDB file in PDB, XML, MMCIF or MMTF format:

```julia
# PDB file format
downloadpdb("1ALW", pdb_dir="path/to/pdb/directory/", file_format=PDB)
# XML file format
downloadpdb("1ALW", pdb_dir="path/to/pdb/directory/", file_format=PDBXML)
# MMCIF file format
downloadpdb("1ALW", pdb_dir="path/to/pdb/directory/", file_format=MMCIF)
# MMTF file format
downloadpdb("1ALW", pdb_dir="path/to/pdb/directory/", file_format=MMTF)
```

Various options can be set through optional keyword arguments when downloading PDB files as follows:

| Keyword Argument | Description |
| :----------------------------- | :-------------------------------------------------------------------------------------------------------------------- |
| `pdb_dir::AbstractString=pwd()`| The directory to which the PDB file is downloaded |
| `file_format::Type=PDB` | The format of the PDB file. Options are PDB, PDBXML, MMCIF or MMTF |
| `obsolete::Bool=false` | If set `true`, the PDB file is downloaded into the auto-generated "obsolete" directory inside the specified `pdb_dir` |
| `overwrite::Bool=false` | If set `true`, overwrites the PDB file if exists in `pdb_dir`; by default skips downloading the PDB file |
| `ba_number::Integer=0` | If set > 0, downloads the respective biological assembly; by default downloads the PDB file |


## Reading PDB files

- To parse a existing PDB file into a Structure-Model-Chain-Residue-Atom framework:

```julia
julia> struc = read("/path/to/pdb/file.pdb", PDB)
Bio.Structure.ProteinStructure
Name - 1EN2.pdb
Number of models - 1
Chain(s) - A
Number of residues - 85
Number of point mutations - 5
Number of other molecules - 5
Number of water molecules - 76
Number of atoms - 614
Number of hydrogens - 0
Number of disordered atoms - 27
```

Various options can be set through optional keyword arguments when parsing a PDB file as follows:

| Keyword Argument | Description |
| :------------------------------------------- | :------------------------------------------------------------------------------ |
| `structure_name::AbstractString="$pdbid.pdb"`| The name of the PDB Structure read. Defaults to "< PDBID >.pdb" |
| `remove_disorder::Bool=false` | If set true, then disordered atoms wont be parsed |
| `read_std_atoms::Bool=true` | If set false, then standard ATOM records wont be parsed |
| `read_het_atoms::Bool=true` | If set false, then HETATOM records wont be parsed |

- To parse a PDB file by specifying the PDB ID and PDB directory into a Structure-Model-Chain-Residue-Atom framework (file name must be in upper case, e.g. "1EN2.pdb")

The function `readpdb` provides an uniform way to download and read PDB files. For example:

```julia
struc = readpdb("1EN2", pdb_dir="/path/to/pdb/directory")
```

The same keyword arguments are taken as `read` above, plus `pdb_dir` and `ba_number`.

- To download and parse a PDB file into a Structure-Model-Chain-Residue-Atom framework in a single line:

```julia
julia> struc = retrievepdb("1ALW", pdb_dir="path/to/pdb/directory")
INFO: Downloading PDB : 1ALW
INFO: Parsing the PDB file...
Bio.Structure.ProteinStructure
Name - 1ALW.pdb
Number of models - 1
Chain(s) - AB
Number of residues - 346
Number of point mutations - 0
Number of other molecules - 10
Number of water molecules - 104
Number of atoms - 2790
Number of hydrogens - 0
Number of disordered atoms - 0
```

Various options can be set through optional keyword arguments when downloading and parsing a PDB file as follows:

| Keyword Argument | Description |
| :--------------------------------------------| :--------------------------------------------------------------------------------------------------------------- |
| `pdb_dir::AbstractString=pwd()` | The directory from which the PDB file is read |
| `obsolete::Bool=false` | If set `true`, PDB file is downloaded into the auto-generated "obsolete" directory inside the specified `pdb_dir`|
| `overwrite::Bool=false` | if set `true`, overwrites the PDB file if exists in `pdb_dir`; by default skips downloading PDB file if exists |
| `ba_number::Integer=0` | If set > 0 reads the respective biological assembly; by default reads PDB file |
| `structure_name::AbstractString="$pdbid.pdb"`| The name of the PDB Structure read. Defaults to "< PDBID >.pdb" |
| `remove_disorder::Bool=false` | If set true, then disordered atoms wont be parsed |
| `read_std_atoms::Bool=true` | If set false, then standard ATOM records wont be parsed |
| `read_het_atoms::Bool=true` | If set false, then HETATOM records wont be parsed |


## Writing PDB files

PDB format files can be written:

```julia
writepdb("1EN2_out.pdb", struc)
```

Any element type can be given as input to `writepdb`. Atom selectors can also be given as additional arguments:

```julia
writepdb("1EN2_out.pdb", struc, backboneselector)
```


## RCSB PDB Utility Functions

- To download the entire RCSB PDB database in your preferred file format:

```julia
downloadentirepdb(pdb_dir="path/to/pdb/directory/", file_format=MMTF, overwrite=false)
```

The keyword arguments are described below:

| Keyword Argument | Description |
| :------------------------------- | :------------------------------------------------------------------------------------------------------- |
| `pdb_dir::AbstractString=pwd()` | The directory to which the PDB files are downloaded |
| `file_format::Type=PDB` | The format of the PDB file. Options are PDB, PDBXML, MMCIF or MMTF |
| `overwrite::Bool=false` | If set `true`, overwrites the PDB file if exists in `pdb_dir`; by default skips downloading the PDB file |

- To update your local PDB directory based on the weekly status list of new, modified and obsolete PDB files from RCSB Server:

```julia
updatelocalpdb(pdb_dir="path/to/pdb/directory/", file_format=MMTF)
```

The `file_format` specifies the format of the PDB files present in the local PDB directory. Obsolete PDB files are stored in the autogenerated `obsolete` directory inside the specified local PDB directory.

- To download all obsolete PDB files from RCSB Server:

```julia
downloadallobsoletepdb(;obsolete_dir="/path/to/obsolete/directory/", file_format=MMCIF, overwrite=false)
```

The `file_format` specfies the format in which the PDB files are downloaded; Options are PDB, PDBXML, MMCIF or MMTF.

If `overwrite=true`, the existing PDB files in obsolete directory will be overwritten by the newly downloaded ones.

- To maintain a local copy of the entire RCSB PDB Database

Run the `downloadentirepdb` function once to download all PDB files and setup a CRON job or similar to run `updatelocalpdb` function once in every week to keep the local PDB directory up to date with the RCSB Server.

There are a few more functions that may help.

| Function | Returns | Return type |
| :----------------- | :------------------------------------------------------------------------------ | :------------------------------------------------------- |
| `pdbentrylist` | List of all PDB entries from RCSB Server | `Array{String,1}` |
| `pdbstatuslist` | List of PDB entries from specified RCSB weekly status list URL | `Array{String,1}` |
| `pdbrecentchanges` | Added, modified and obsolete PDB lists from the recent RCSB weekly status files | `Tuple{Array{String,1},Array{String,1},Array{String,1}}` |
| `pdbobsoletelist` | List of all obsolete PDB entries in the RCSB server | `Array{String,1}` |


## Examples

A few further examples of `Bio.Structure` usage are given below.
Expand Down
Loading

0 comments on commit 1c89218

Please sign in to comment.