This document accompanies the SomaDataIO
R package, which loads and
exports ‘SomaScan’ data via the SomaLogic Operating Co., Inc.
proprietary text file called an ADAT (*.adat
). The package also
exports auxiliary functions for manipulating, wrangling, and extracting
relevant information from an ADAT object once in memory. Basic
familiarity with the R environment is assumed, as is the ability to
install contributed packages from the Comprehensive R Archive Network
(CRAN).
If you run into any issues/problems with SomaDataIO
full documentation
of the most recent
release can be found
at our pkgdown website hosted
by GitHub. If the issue
persists we encourage you to consult the
issues page and, if
appropriate, submit an issue and/or feature request.
The SomaDataIO
package is licensed under the
MIT
license and is intended solely for research use only (“RUO”) purposes.
The code contained herein may not be used for diagnostic, clinical,
therapeutic, or other commercial purposes.
The easiest way to install SomaDataIO
is to install directly from
CRAN:
install.packages("SomaDataIO")
Alternatively from GitHub:
remotes::install_github("SomaLogic/SomaDataIO")
which installs the most current “development” version from the
repository HEAD
. To install the most recent release, use:
remotes::install_github("SomaLogic/SomaDataIO@*release")
To install a specific tagged release, use:
remotes::install_github("SomaLogic/SomaDataIO@v5.3.0")
The SomaDataIO
package was intentionally developed to contain a
limited number of dependencies from CRAN. This makes the package more
stable to external software design changes but also limits its contained
feature set. With this in mind, SomaDataIO
aims to strike a balance
providing long(er)-term stability and a limited set of features. Below
are the package dependencies (see also the DESCRIPTION
file):
The Biobase
package is suggested, being required by only two
functions, pivotExpressionSet()
and adat2eSet()
.
Biobase
must be installed separately from
Bioconductor by entering the following
from the R
Console:
if (!requireNamespace("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("Biobase", version = remotes::bioc_version())
Information about Bioconductor can be found here: https://bioconductor.org/install/
Upon successful installation, load the SomaDataIO
as normal:
library(SomaDataIO)
For an index of available commands:
library(help = SomaDataIO)
The SomaDataIO
package comes with four (4) objects available to users
to run canned examples (or analyses). They can be accessed once
SomaDataIO
has been attached via library()
. They are:
-
example_data
: the original ‘SomaScan’ file (example_data.adat
) can be found here or downloaded directly via:wget https://raw.githubusercontent.com/SomaLogic/SomaLogic-Data/master/example_data.adat
-
it has been replaced by an abbreviated, light-weight version containing only the first 10 samples and can be found at:
system.file("extdata", "example_data10.adat", package = "SomaDataIO")
-
-
ex_analytes
: the analyte (feature) variables inexample_data
-
ex_anno_tbl
: the annotations table associated withexample_data
-
ex_target_names
: a mapping object for analyte -> target -
See also
?SomaScanObjects
- Loading data (Import)
- parse and import a
*.adat
text file into anR
session as asoma_adat
object.
- parse and import a
- Wrangling data (manipulation)
- subset, reorder, and list various fields of a
soma_adat
object. ?SeqId
analyte (feature) matching.dplyr
andtidyr
verb S3 methods for thesoma_adat
class.?rownames
helpers that do not breaksoma_adat
attributes.- please see vignette
vignette("loading-and-wrangling", package = "SomaDataIO")
- subset, reorder, and list various fields of a
- Exporting data (Output)
- write out a
soma_adat
object as a*.adat
text file.
- write out a
Loading an ADAT text file is simple using read_adat()
:
# Sample file name
f <- system.file("extdata", "example_data10.adat",
package = "SomaDataIO", mustWork = TRUE)
my_adat <- read_adat(f)
is.soma_adat(my_adat)
#> [1] TRUE
# S3 print method (forwards -> tibble)
my_adat
#> ══ SomaScan Data ═══════════════════════════════════════════════════════════════
#> Attributes intact ✓
#> Rows 10
#> Columns 5318
#> Clinical Data 34
#> Features 5284
#> ── Column Meta ─────────────────────────────────────────────────────────────────
#> ℹ SeqId, SeqIdVersion, SomaId, TargetFullName, Target, UniProt, EntrezGeneID,
#> ℹ EntrezGeneSymbol, Organism, Units, Type, Dilution, PlateScale_Reference,
#> ℹ CalReference, Cal_Example_Adat_Set001, ColCheck,
#> ℹ CalQcRatio_Example_Adat_Set001_170255, QcReference_170255,
#> ℹ Cal_Example_Adat_Set002, CalQcRatio_Example_Adat_Set002_170255, Dilution2
#> ── Tibble ──────────────────────────────────────────────────────────────────────
#> # A tibble: 10 × 5,319
#> row_names PlateId Plate…¹ Scann…² Plate…³ SlideId Subar…⁴ Sampl…⁵ Sampl…⁶
#> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr>
#> 1 258495800012… Exampl… 2020-0… SG1521… H9 2.58e11 3 1 Sample
#> 2 258495800004… Exampl… 2020-0… SG1521… H8 2.58e11 7 2 Sample
#> 3 258495800010… Exampl… 2020-0… SG1521… H7 2.58e11 8 3 Sample
#> 4 258495800003… Exampl… 2020-0… SG1521… H6 2.58e11 4 4 Sample
#> 5 258495800009… Exampl… 2020-0… SG1521… H5 2.58e11 4 5 Sample
#> 6 258495800012… Exampl… 2020-0… SG1521… H4 2.58e11 8 6 Sample
#> 7 258495800001… Exampl… 2020-0… SG1521… H3 2.58e11 3 7 Sample
#> 8 258495800004… Exampl… 2020-0… SG1521… H2 2.58e11 8 8 Sample
#> 9 258495800001… Exampl… 2020-0… SG1521… H12 2.58e11 8 9 Sample
#> 10 258495800004… Exampl… 2020-0… SG1521… H11 2.58e11 3 170261 Calibr…
#> # … with 5,310 more variables: PercentDilution <int>, SampleMatrix <chr>,
#> # Barcode <lgl>, Barcode2d <chr>, SampleName <lgl>, SampleNotes <lgl>,
#> # AliquotingNotes <lgl>, SampleDescription <chr>, AssayNotes <lgl>,
#> # TimePoint <lgl>, …, and abbreviated variable names ¹PlateRunDate,
#> # ²ScannerID, ³PlatePosition, ⁴Subarray, ⁵SampleId, ⁶SampleType
#> ════════════════════════════════════════════════════════════════════════════════
Please see vignette
vignette("loading-and-wrangling", package = "SomaDataIO")
for more
details and options.
The soma_adat
class comes with numerous class-specific S3 methods to
the most popular dplyr and
tidyr generics.
# see full complement of `soma_adat` methods
methods(class = "soma_adat")
#> [1] [ [[ [[<- [<- ==
#> [6] $ $<- anti_join arrange count
#> [11] filter full_join getAnalytes getMeta group_by
#> [16] inner_join is_seqFormat left_join Math median
#> [21] merge mutate Ops print rename
#> [26] right_join sample_frac sample_n semi_join separate
#> [31] slice_sample slice summary Summary transform
#> [36] ungroup unite
#> see '?methods' for accessing help and source code
Please see vignette
vignette("loading-and-wrangling", package = "SomaDataIO")
for more
details about available soma_adat
methods.
The soma_adat
object also contains specific structure that are useful
to users. Please also see ?colmeta
or ?annotations
for further
details about these fields.
This section now lives in individual package vignettes. For further detail please see:
- Two-group comparison (e.g. differential expression) via t-test
- see vignette
vignette("two-group-comparison", package = "SomaDataIO")
- see vignette
- Binary classification
- see vignette
vignette("binary-classification", package = "SomaDataIO")
- see vignette
- Linear regression
- see vignette
vignette("linear-regression", package = "SomaDataIO")
- see vignette
- See LICENSE
- The MIT License:
Created by Rmarkdown (v2.20) and R version 4.2.2 (2022-10-31).