-
Notifications
You must be signed in to change notification settings - Fork 3
External data
Data used for the modeling (generally, experimental data) should not be stored directly in the mmCIF files, but instead should be deposited in a suitable repository and linked to, for several reasons:
- size: some experimental datasets are extremely large, so it's not efficient to store them in a text-based format like mmCIF.
- existing standards: little point in developing a new format to store something when an existing format has wide adoption (e.g. MRC format for EM maps).
- deduplication: little point in duplicating data that's already available elsewhere.
- domain expertise: experts in each experimental field are better qualified to determine the file formats, database structure, etc.
Where an existing repository isn't available, it is possible to deposit files somewhere and obtain a DOI (for example, Zenodo), but this should be considered a temporary measure until a database is established.
Modeling generally uses processed data (for example, an EM map). Where possible, both the processed data and the original raw data (for example, a set of EM micrographs) should be deposited somewhere.
The state of each experimental field is summarized below.
Domain experts: Claus Seidel
File formats: Photon-HDF5 (see also the FRETBursts software)
Data linked from mmCIF: pairs of interaction sites?
Repositories: none (?)
Domain experts: Ardan Patwardhan
File formats: MRC
Data linked from mmCIF: raw micrographs, 2D class averages, 3D maps
Repositories:
Domain experts: Al Kikhney
File formats: SAS profiles, ...
Data linked from mmCIF: SAXS profile, ab initio shape
Repositories: SASBDB
Domain experts: Juri Rappsilber, Alexander Leitner
File formats: simple tabulated data, ...
Data linked from mmCIF: tabulated sets of proximate residues (e.g. for the yeast Nup84 complex), spectra/peaklists (e.g. for the yeast Mediator complex)
Repositories:
- Sets of proximate residues: none
- Peaklists: MASSIVE