-
Notifications
You must be signed in to change notification settings - Fork 63
Conversation
>Also added new Base.read() for reading PDB files with PDB ID and directory. >Bug Fixes
Great, thanks for making this PR. I will have a look at this tomorrow. |
We have few things to discuss. 1. Whether the 2. Should we write test cases? 3. Should |
src/structure/pdb.jl
Outdated
function downloadpdb(pdbid::AbstractString, | ||
out_filepath::AbstractString="$pdbid.pdb"; | ||
ba_number::Integer=0) | ||
function getallpdbentries() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general we avoid get
in function names. The name should describe what is returned, perhaps pdbentrylist
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should be fine. I will change it accordingly.
src/structure/pdb.jl
Outdated
""" | ||
Returns a list of pdb codes in the weekly pdb status file from the given URL. | ||
""" | ||
function getstatuslist(url::AbstractString) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function name should reference the PDB, perhaps pdbstatuslist
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What URL is the user supposed to use here? If there is a certain one perhaps put it in the docstring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually see comment below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The URL is supposed be weekly status file from RCSB Server. I will add it in the docstrings.
src/structure/pdb.jl
Outdated
""" | ||
Returns three lists of the newest weekly files (added,modified,obsolete) from RCSB PDB Server | ||
""" | ||
function getrecentchanges() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps pdbrecentchanges
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. K
src/structure/pdb.jl
Outdated
""" | ||
Returns a list of all obsolete entries ever in the RCSB PDB server | ||
""" | ||
function getallobsolete() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps pdbobsoletelist
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. K
src/structure/pdb.jl
Outdated
if the keyword argument `obsolete` is set `true`, the PDB files are downloaded into the obsolete directory inside `pdb_dir`; | ||
if the keyword argument `overwrite` is set `true`, then it will overwrite the PDB file if it exists in the `pdb_dir`; | ||
""" | ||
function downloadmultiplepdb(pdbidlist::AbstractArray{String,1}; pdb_dir::AbstractString=pwd(), obsolete::Bool=false, overwrite::Bool=false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This name seems a bit long. We could even make this another method of downloadpdb
, so that function can take either a string or a list.
I think this should be pdbidlist::Array{String,1}
for the first argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. I will override the downloadpdb
for downloading multiple PDB files. Regarding pdbidlist::Array{String,1}
I was getting an error, defining it as AbstractArray
was only working. I will look into it and update accordingly.
src/structure/pdb.jl
Outdated
@@ -107,6 +321,17 @@ function Base.read(filepath::AbstractString, | |||
end | |||
end | |||
|
|||
# Read PDB file based on PDB ID and pdb_dir. | |||
function Base.read(pdbid::AbstractString, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method clashes with the above read
method. I wonder whether if it is necessary, but if it goes in then the directory should be a full argument (so args are pdbid, directory, PDB).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I just added it because that will be in a uniform format as the rest. If users wants to download first and then read it later, they will have uniform way of calling the functions. We may change the arguments as you mentioned.
src/structure/pdb.jl
Outdated
|
||
|
||
""" | ||
Download a PDB file or biological assembly from the RCSB PDB server. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use line breaks in the doc strings to keep line length to 80.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
K. I had this doubt. I will keep the line length to 80 in docstrings.
src/structure/pdb.jl
Outdated
pdbidlist = Array{String,1}() | ||
info("Fetching list of all PDB Entries from RCSB PDB Server...") | ||
download("ftp://ftp.wwpdb.org/pub/pdb/derived_data/index/entries.idx","entries.idx") | ||
open("entries.idx") do input |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we make then delete the file here, I wonder if we should use a temporary filepath? If the user has entries.idx
in their directory here it is getting overwritten without warning. You can use tempname()
to save a name to a variable, write to this then delete it (I think there are examples of this in the PDB tests).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. I didn't know about it. thank you.
src/structure/pdb.jl
Outdated
|
||
|
||
""" | ||
Returns a list of pdb codes in the weekly pdb status file from the given URL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pedantry - PDB should be capital in docstrings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. K.
src/structure/pdb.jl
Outdated
if the keyword argument `obsolete` is set `true`, the PDB file is downloaded into the obsolete directory inside `pdb_dir`; | ||
if the keyword argument `overwrite` is set `true`, then it will overwrite the PDB file if it exists in the `pdb_dir`; | ||
""" | ||
function retrievepdb(pdbid::AbstractString; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice function to have, difficult to find a descriptive name for it though. A descriptive one would be downloadpdbandread
but that is way too long. Maybe this name is okay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya. I was so confused. Then finally decided keep it as in BioPython. So if BioPython users are using this, they may find it easy. As mentioned downloadpdbandread
will be too long. We will keep it as retrievepdb
as of now. Any other good function name is welcomed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One possibility is readfrompdb
.
So overall looks good, thanks for adding this useful feature. There are a couple more things but the above is most of my issues. For reference the Biopython implementation is here: https://github.com/biopython/biopython/blob/master/Bio/PDB/PDBList.py In answer to your specific questions: |
|
There is one more thing that could be added now you are looking at this code, namely download of mmCIF and MMTF files from the PDB. Since mmCIF is now the standard, this is an important feature (I am writing an mmCIF parser for |
@jgreener64 Thank you so much for your review. I will update the code accordingly in the future commits. Adding option |
src/structure/pdb.jl
Outdated
Returns a list of pdb codes in the weekly pdb status file from the given URL. | ||
""" | ||
function getstatuslist(url::AbstractString) | ||
statuslist = Array{String,1}() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think statuslist = String[]
is better, it seems to allocate less (this applies to all similar ones below too).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
K. I didnt know that
src/structure/pdb.jl
Outdated
push!(failedlist,pdbid) | ||
end | ||
end | ||
warn(length(failedlist)," PDB file failed to download : ", failedlist) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only do this if length(failedlist) > 0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I missed it
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the great changes @joels94 ! In my view this is near completion. I have made some more comments. A few more things before merge:
-
Write some tests (not required for
downloadentirepdb
,updatelocalpdb
ordownloadallobsoletepdb
I would say). -
Add a section to the docs (http://biojulia.net/Bio.jl/latest/man/structure/) by editing https://github.com/BioJulia/Bio.jl/blob/master/docs/src/man/structure.md . Perhaps have a table with the functions similar to other sections.
-
It would be good to get feedback from someone else, e.g. @bicycle1885 ?
src/structure/pdb.jl
Outdated
Download a Protein Data Bank (PDB) file or biological assembly from the RCSB | ||
PDB. By default downloads the PDB file; if `ba_number` is set the biological | ||
assembly with that number will be downloaded. | ||
Returns a list of all PDB entries from RCSB PDB server |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pedantry - full stop at the end (and in some other docstrings).
src/structure/pdb.jl
Outdated
end | ||
linecount +=1 | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You do need to explicitly remove the temp file I think - tempname()
gets you an available name then you download to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This applies elsewhere you have used temp files too.
src/structure/pdb.jl
Outdated
from RCSB PDB Server | ||
""" | ||
function pdbrecentchanges() | ||
addedlist = String[] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These lines are not required as pdbstatuslist
returns the array.
src/structure/pdb.jl
Outdated
if the keyword argument `obsolete` is set `true`, the PDB file is downloaded | ||
into the obsolete directory inside `pdb_dir`; | ||
if the keyword argument `overwrite` is set `true`, then it will overwrite the | ||
PDB file if it exists in the `pdb_dir`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
End with full stop.
src/structure/pdb.jl
Outdated
""" | ||
function downloadpdb(pdbid::AbstractString; pdb_dir::AbstractString=pwd(), file_format::Type=PDB, obsolete::Bool=false, overwrite::Bool=false, ba_number::Integer=0) | ||
# A Dict mapping the type to their file extensions | ||
pdbextension = Dict{Type,String}( PDB => ".pdb", PDBXML => ".xml", mmCIF => ".cif", MMTF => ".mmtf") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This Dict
is defined in a couple of places so can be taken out of the functions and defined as const Dict
near the top.
src/structure/pdb.jl
Outdated
# check if PDB file is downloaded and extracted properly | ||
if !ispath(pdbpath) || filesize(pdbpath)==0 | ||
# If the file size is 0, its deleted. force=true ensures error is not thrown when file does not exists | ||
rm(pdbpath, force=true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would avoid forcing deletion and check again if the file exists here.
src/structure/pdb.jl
Outdated
if the keyword argument `overwrite` is set `true`, then it will overwrite the | ||
PDB file if it exists in the `pdb_dir`; | ||
""" | ||
function downloadpdb(pdbidlist::AbstractArray{String,1}; pdb_dir::AbstractString=pwd(), file_format::Type=PDB, obsolete::Bool=false, overwrite::Bool=false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be pdbidlist::Array{String,1}
, did you say this caused a problem?
src/structure/pdb.jl
Outdated
pdb_dir::AbstractString=pwd(), | ||
ba_number::Integer=0, | ||
structure_name::AbstractString="$pdbid.pdb", | ||
remove_disorder::Bool=false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These 3 arguments can be replaced by kwargs...
and then pass that to the inner read function (this means the defaults for remove_disorder
etc. are defined in one place only).
src/structure/pdb.jl
Outdated
read_het_atoms::Bool=true) | ||
filepath = joinpath(pdb_dir,"$pdbid.pdb") | ||
pdbpath = ba_number == 0 ? filepath : filepath*"$ba_number" | ||
open(pdbpath, "r") do input |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This open
is not actually required as calling read
with the filepath is already defined below.
src/structure/pdb.jl
Outdated
filepath = joinpath(pdb_dir,"$pdbid.pdb") | ||
end | ||
pdbpath = ba_number == 0 ? filepath : filepath*"$ba_number" | ||
open(pdbpath, "r") do input |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This open
is not actually required as calling read
with the filepath is already defined below.
Codecov Report
@@ Coverage Diff @@
## master #483 +/- ##
=========================================
+ Coverage 70.34% 70.94% +0.6%
=========================================
Files 34 34
Lines 2421 2537 +116
=========================================
+ Hits 1703 1800 +97
- Misses 718 737 +19
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tried out some of the functions and they appear okay. I have left a few more small comments. Thanks for addressing everything. I'll look again when tests and docs are up 👍
src/structure/pdb.jl
Outdated
if the keyword argument `overwrite` is set `true`, then it will overwrite the | ||
PDB file if it exists in the `pdb_dir`. | ||
""" | ||
function downloadpdb(pdbidlist::Array{String,1}; pdb_dir::AbstractString=pwd(), file_format::Type=PDB, obsolete::Bool=false, overwrite::Bool=false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last 4 arguments here could just be kwargs...
so their defaults are defined in one place above.
src/structure/pdb.jl
Outdated
if ispath(archivefilepath) && filesize(archivefilepath) > 0 && file_format != MMTF | ||
input = open(archivefilepath) |> ZlibInflateInputStream | ||
open(pdbpath,"w") do output | ||
for line in eachline(input) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This prints each newline character twice so the files alternate line/blank line. Could use print
rather than println
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is also a system specific issue? Because when using println()
the file is generated properly and I can read them. But if I use print()
, all lines are getting concatenated and I cannot read the file. I have attached the two files for your reference. (Renamed the extensions as .txt as I was not able upload in .pdb format). Let me know what you find.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this. It seems to be a Julia 0.5/0.6 issue, see the first breaking change for Julia v0.6.0 here: https://github.com/JuliaLang/julia/blob/master/NEWS.md .
In 0.5 the line break is not removed by eachline, in 0.6 it is. In order to be compatible with both I think we can use eachline(..., chomp=false)
and use print
inside.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Guess, I should keep an eye on the release notes as Julia changes a lot in each version. Thank you. I will update accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I just tested that on 0.5 and eachline
can't take the chomp
argument. It should be
for line in eachline(input)
println(output, chomp(line))
end
Sorry about that.
src/structure/pdb.jl
Outdated
throw(ErrorException("Error downloading PDB : $pdbid")) | ||
end | ||
end | ||
rm(archivefilepath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the condition is satisfied on line 180 then this will error as the file is not created. Either check the file exists here or move it up in the logic to where you know it exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. The archivefilepath
exists independent of the if condition in line 180. Its because, archivefilepath=tempname()
in line 172 actually creates an empty temporary file instead of just the path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, it doesn't do that on my machine. I wonder if that is system specific. For example if I attempt to overwrite when overwrite is false I get
INFO: PDB Exists : 1AKE
ERROR: unlink: no such file or directory (ENOENT)
in unlink(::String) at /Applications/Julia-0.5.app/Contents/Resources/julia/lib/julia/sys.dylib:?
in #rm#7(::Bool, ::Bool, ::Function, ::String) at /Applications/Julia-0.5.app/Contents/Resources/julia/lib/julia/sys.dylib:?
in #downloadpdb#1(::String, ::Type{T}, ::Bool, ::Bool, ::Int64, ::Function, ::String) at ./REPL[1]:61
in (::#kw##downloadpdb)(::Array{Any,1}, ::#downloadpdb, ::String) at ./<missing>:0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Can you check by trying ispath(tempname())
? For me i m getting true
in Windows 10
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Julia 0.6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get false
in Mac OSX, Julia 0.6. Searching this on Julia turns up JuliaLang/julia#9053 which addresses this.
I guess check it exists and if so remove, which will work either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Ok.
src/structure/pdb.jl
Outdated
else | ||
# Will download error page if ba_number is too high | ||
download("http://www.rcsb.org/pdb/files/$pdbid.pdb$ba_number", out_filepath) | ||
pdbpath = joinpath(pdb_dir,"$pdbid"*pdbextension[file_format]*"$ba_number") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking a bit more about it, the biological number should not change the extension of the file. Ideally it would change the filename, e.g. 1ABC_ba1.pdb
or something like that rather than 1ABC.pdb1
.
src/structure/pdb.jl
Outdated
# The first 4 characters in the line is the PDB ID | ||
pdbid = uppercase(line[1:4]) | ||
# Check PDB ID is 4 characters long and only consits of alphanumeric characters | ||
if length(pdbid) != 4 || ismatch(r"[^a-zA-Z0-9]", pdbid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r"^[a-zA-Z0-9]{4}$"
will check the length as well.
src/structure/pdb.jl
Outdated
linecount +=1 | ||
end | ||
end | ||
rm(tempfilepath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rm(tempfilepath, force=true)
would be better because it works even when the file does not exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, to make sure that the temporary file will be deleted, you need to use the try-catch-finally statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't try-finally
be sufficient? That way we may be able to know what error occurs. Which might be helpful in the initial stages to debug the code.
src/structure/pdb.jl
Outdated
# MMTF is downloaded in uncompressed form, thus directly stored in pdbpath | ||
download("http://mmtf.rcsb.org/v1.0/full/$pdbid", pdbpath) | ||
else | ||
warn("Invalid PDB file format!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should throw an ArgumentError
exception.
src/structure/pdb.jl
Outdated
elseif file_format == mmCIF | ||
download("http://files.rcsb.org/download/$pdbid-assembly$ba_number"*pdbextension[file_format]*".gz", archivefilepath) | ||
else | ||
warn("Biological Assembly is available only in PDB and mmCIF formats!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ArgumentError
.
src/structure/pdb.jl
Outdated
|
||
|
||
""" | ||
Download a PDB file or biological assembly from the RCSB PDB server. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first line should be the signature of the function (see: https://docs.julialang.org/en/stable/manual/documentation).
src/structure/pdb.jl
Outdated
end | ||
end | ||
# Verify if the compressed PDB file is downloaded properly and extract it. For MMTF no extraction is needed | ||
if ispath(archivefilepath) && filesize(archivefilepath) > 0 && file_format != MMTF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use isfile
, not ispath
.
src/structure/pdb.jl
Outdated
function downloadentirepdb(;pdb_dir::AbstractString=pwd(), file_format::Type=PDB, overwrite::Bool=false) | ||
# Get the list of all pdb entries from RCSB PDB Server using getallpdbentries() and downloads them | ||
pdblist = pdbentrylist() | ||
info("About to download "*string(length(pdblist))*" PDB files. Make sure to have enough disk space and time!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't you use string interpolation? "... download $(length(pdblist)) PDB files..."
Decompose Bio.Align and Bio.Intervals (BioJulia#482)
I have written the test cases. I have also made few changes to the code. Kindly take a look at it and let me know if any changes are required. |
TO DO
Finally, completed writing the documentation. Let me if any changes are required before merging the code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I am happy with the code now, thanks for making those changes. The tests look good too.
I think the doc changes need a bit of re-ordering though. The content is good but I think that the first few things the user reads should be how to download a PDB file and read it in the manner consistent with the other BioJulia read
interfaces, i.e. read("path.pdb", PDB)
.
So I suggest keeping the start of the docs the same, apart from maybe changing "To download a PDB file" to "To download a PDB file - see below for more options".
Then after the struc = read(filepath_1EN2, PDB)
box have a line or box talking about retrievepdb
as a shortcut. The other options for downloadpdb
, maintaining a local PDB copy etc. can go in the "RCSB PDB Metadata" section at the bottom, which could be renamed to "RCSB PDB utility functions".
src/structure/pdb.jl
Outdated
info("Downloading PDB : $pdbid") | ||
if ba_number == 0 | ||
if file_format == PDB || file_format == PDBXML || file_format == mmCIF | ||
download("http://files.rcsb.org/download/$pdbid"*pdbextension[file_format]*".gz", archivefilepath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May as well use string interpolation as "http://files.rcsb.org/download/$(pdbid)$(pdbextension[file_format]).gz"
here (and throughout).
docs/src/man/structure.md
Outdated
| Keyword Argument | Description | | ||
| :----------------------------- | :-------------------------------------------------------------------------------------------------------------------- | | ||
| `pdb_dir::AbstractString=pwd()`| The directory to which the PDB file is downloaded | | ||
| `file_format::Type=PDB` | The format of the PDB file. Options <PDB, PDBXML, mmCIF, MMTF> | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "Options are PDB, PDBXML, mmCIF or MMTF" is more readable.
docs/src/man/structure.md
Outdated
If `overwrite=true`, the existing PDB files in obsolete directory will be overwritten by the newly downloaded ones. | ||
|
||
|
||
## Maintaining a Local Copy of the entire RCSB PDB Database |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pedantry - no capitals on local copy.
docs/src/man/structure.md
Outdated
| Keyword Argument | Description | | ||
| :----------------------------- | :------------------------------------------------------------------------------------------------------- | | ||
| `pdb_dir::AbstractString=pwd()`| The directory to which the PDB files are downloaded | | ||
| `file_format::Type=PDB` | The format of the PDB file. Options <PDB, PDBXML, mmCIF, MMTF> | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
docs/src/man/structure.md
Outdated
@@ -244,6 +366,18 @@ julia> rad2deg(psiangle(struc['A'][50], struc['A'][51])) | |||
``` | |||
|
|||
|
|||
## RCSB PDB Metadata | |||
|
|||
Few functions that may help fetching information about the RCSB PDB Database. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pedantry - "There are a few functions that may help" etc.
On further thought, should the |
Updated docs and code as discussed. Let me if further changes are required. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making the changes. I am happy with the code and tests and would suggest a few more small changes to the docs before merge. Hopefully they won't take long.
Sorry to spend a while on the docs but I think it's very important to give a concise and useful overview of the module.
docs/src/man/structure.md
Outdated
struc = readpdb("1EN2", pdb_dir="/path/to/pdb/directory") | ||
``` | ||
|
||
**Note:** This requires the PDB file name to be uppercase PDB ID. Example : "1EN2.pdb" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe remove this line and change the above line to 'To parse a PDB file by specifying the PDB ID and PDB directory into a Structure-Model-Chain-Residue-Atom framework (file name must be in upper case, e.g. "1EN2.pdb")'.
docs/src/man/structure.md
Outdated
Number of disordered atoms - 27 | ||
``` | ||
|
||
Various options can be set through optional keyword arguments when parsing a PDB file as follows: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is indented - to be in line with the rest of the file I don't think any indentation is needed, even for code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This applies to other lines too.
docs/src/man/structure.md
Outdated
@@ -40,6 +41,8 @@ Number of hydrogens - 0 | |||
Number of disordered atoms - 27 | |||
``` | |||
|
|||
**Note** : Refer [Downloading PDB files](#downloading-pdb-files) and [Reading PDB files](#reading-pdb-files) sections for more options. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Refer to..."
docs/src/man/structure.md
Outdated
|
||
**Note:** This requires the PDB file name to be uppercase PDB ID. Example : "1EN2.pdb" | ||
|
||
The function `readpdb` provides an uniform way to download and read PDB files. For example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure lines 309-337 are required. Perhaps we could just say "The function readpdb
provides an uniform way to download and read PDB files, for example readpdb("1EN2",pdb_dir="/path/to/pdb/directory")
. The same keyword arguments are taken as read
above, plus pdb_dir
and ba_number
."
docs/src/man/structure.md
Outdated
| `pdbobsoletelist` | List of all obsolete PDB entries in the RCSB server | `Array{String,1}` | | ||
|
||
|
||
## Maintaining a local copy of the entire RCSB PDB Database |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section title is quite long, and the section mainly refers to the above section. Perhaps remove this section and put the sentence "Run the downloadentirepdb
function..." into the above section.
@@ -20,13 +20,14 @@ The `Bio.Structure` module provides functionality to manipulate macromolecular s | |||
To download a PDB file: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps change this section title from "Parsing PDB files" to "Basics" as this makes more sense in the new arrangement.
downloadpdb("1EN2") | ||
``` | ||
|
||
To parse a PDB file into a Structure-Model-Chain-Residue-Atom framework: | ||
|
||
```julia | ||
julia> struc = read(filepath_1EN2, PDB) | ||
julia> struc = read("/path/to/pdb/file.pdb", PDB) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reference, this was initially set as filepath_1EN2
to pass the doctests but I don't think they are being tested here so this can stay as "/path/to/pdb/file.pdb"
.
I have updated the docs. Let me know if any changes are required. |
Great, I'm happy with this to go in. Thanks for all the work. I will wait until tomorrow in case anyone else has any comments, then I'll merge. |
I have been working on few enhancements over handling PDB files based on BioPython library to be true.
Such as: