Skip to content
Ben Webb edited this page Feb 28, 2017 · 11 revisions

Data used for the modeling (generally, experimental data) should not be stored directly in the mmCIF files, but instead should be deposited in a suitable repository and linked to, for several reasons:

  • size: some experimental datasets are extremely large, so it's not efficient to store them in a text-based format like mmCIF.
  • existing standards: little point in developing a new format to store something when an existing format has wide adoption (e.g. MRC format for EM maps).
  • deduplication: little point in duplicating data that's already available elsewhere.
  • domain expertise: experts in each experimental field are better qualified to determine the file formats, database structure, etc.

Where an existing repository isn't available, it is possible to deposit files somewhere and obtain a DOI (for example, Zenodo), but this should be considered a temporary measure until a database is established.

Modeling generally uses processed data (for example, an EM map). Where possible, both the processed data and the original raw data (for example, a set of EM micrographs) should be deposited somewhere.

The state of each experimental field is summarized below.

FRET

Domain experts: Claus Seidel

File formats: Photon-HDF5 (see also the FRETBursts software)

Data linked from mmCIF: pairs of interaction sites?

Repositories: none (?)

EM

Domain experts: Ardan Patwardhan

File formats: MRC

Data linked from mmCIF: raw micrographs, 2D class averages, 3D maps

Repositories:

  • 3D maps: EMDB
  • Micrographs: EMPIAR
  • Class averages: none (?)

SAS

Domain experts: Al Kikhney

File formats: SAS profiles, ...

Data linked from mmCIF: SAXS profile, ab initio shape

Repositories: SASBDB

XL-MS

Domain experts: Juri Rappsilber, Alexander Leitner

File formats: simple tabulated data, ...

Data linked from mmCIF: tabulated sets of proximate residues (e.g. for the yeast Nup84 complex), spectra/peaklists (e.g. for the yeast Mediator complex)

Repositories:

  • Sets of proximate residues: none
  • Peaklists: MASSIVE
Clone this wiki locally