Genomic Data Retrieval with R
Package generalization
Over 5000 lines have been edited, most of them removed (#100), to generalize the package to make it more safe for future
development. This progress is still ongoing.
- @Roleren is joining as package author and new core developer of
biomartr
.
New features
- Ensembl genomes is no longer a different database compared to ensembl in biomaRt, since this split is artifical.
It is adviced to use only "ensembl" as db from now on, but "ensemblgenomes" will still work. - Annotation did mean gff, but it should be both gff and gtf getter, with format specification, this is now fixed and generalized.
- Added in new kingdom for ensembl: protists supportwith correct collection getters
- The retrieval from the
UniProt
database is now updated to the new API/FTP path system. Now users
can retrieve proteomes using the functionsgetProteome(db = "uniprot", ...)
andgetProteomeSet(db = "uniprot", ...)
(see #82) - new function
getBioSet
: Generic Bio data set extractor - new function
getBio
: A wrapper to all bio getters, selected with 'type' argument - a new function
getUniProtSTATS()
: Retrieve UniProt Database Information File (STATS)
Power user cache
The package now supports caching of back end files which used to be saved to /tmp folder (i.e. lost on computer restart).
This make it easy for power users who want higher speed. For more info, see the function ?cachedir_set
Bug fixes
- Fixed many wrong urls and non working functions, more tests are added to make sure they work.
- Fixed fungi collection accessor for ensembl