Data Collection: Assemble Development Databases #6

hrovira · 2013-11-18T19:32:51Z

Latest run of FMP into group of databases per tumor type for use in development

Consolidate reference data for use in RE and GS
Establish development server : KRAKEN

spacepod · 2013-11-20T00:27:40Z

Is the goal for this issue to have a list of the relevant FMP directories/files, or to have the contents of those files imported?

Likewise, for this issue, should the reference datasets be collected and listed, or actually imported?

hrovira · 2013-11-20T22:30:08Z

The goal is to have a set of databases for each tumor type and a datamodel.json that can be used in development for RE and GS. The database should reside in the server, connections should be allowed from dev workstations.

Reference data is a lesser priority for this task.

spacepod · 2013-11-22T19:45:04Z

FYI I've installed mongodb on kraken, under /local/mongodb. I've created /local/mongodb/bin which links to the current versions of the mongodb installation, and I've updated the www user's path accordingly.

spacepod · 2013-11-22T19:53:58Z

Notes: start mongo server under www user with

numactl --interleave=all mongod --dbpath /local/mongodb/db

see http://docs.mongodb.org/manual/administration/production-notes/#mongodb-on-numa-hardware

spacepod · 2013-11-22T20:04:02Z

current filesystem structure:

mongodb files live here:

/local/mongodb/db

data files

$cr9/workspaces/canonical_datasets
|-- BLCA
|   `-- 20131113
|       `-- BLCA.SEQ.20131113.tsv
|       `-- BLCA.SEQ.20131113-provenance.tsv
|-- BRCA
|   `-- 20131113
|       `-- BRCA.SEQ.20131113.tsv
|       `-- BRCA.SEQ.20131113-provenance.tsv
|       `-- BRCA.ARY.20131113.tsv
|       `-- BRCA.ARY.20131113-provenance.tsv
…

spacepod · 2013-11-27T07:04:43Z

Resolved.

Bare-bones sample datamodel.json with only one tumor type for review at $cr9/workspaces/canonical_datasets.json;
All SEQ data loaded into local mongodb; each tumortype per platform per date is one database, such as:
BRCA-SEQ-20131113
Within each database is currently one collection: feature_matrix

Optional changes which can be discussed:

renaming databases with underscores instead of dashes, if that is an existing convention;
db name might not include the platform type (seq vs ary) and instead each type could exist as a separate feature_matrix collection (feature_matrix_seq, etc) with some metadata in the datamodel.json.

spacepod · 2013-11-27T07:05:18Z

(awaiting hrovira's comments and/or closing issue)

ghost assigned spacepod Nov 18, 2013

hrovira mentioned this issue Nov 18, 2013

RE Upgrades: Scatterplot cancerregulome/RegulomeExplorer--Deprecated#10

Open

spacepod closed this as completed Nov 27, 2013

spacepod reopened this Nov 27, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Collection: Assemble Development Databases #6

Data Collection: Assemble Development Databases #6

hrovira commented Nov 18, 2013

spacepod commented Nov 20, 2013

hrovira commented Nov 20, 2013

spacepod commented Nov 22, 2013

spacepod commented Nov 22, 2013

spacepod commented Nov 22, 2013

spacepod commented Nov 27, 2013

spacepod commented Nov 27, 2013

Data Collection: Assemble Development Databases #6

Data Collection: Assemble Development Databases #6

Comments

hrovira commented Nov 18, 2013

spacepod commented Nov 20, 2013

hrovira commented Nov 20, 2013

spacepod commented Nov 22, 2013

spacepod commented Nov 22, 2013

spacepod commented Nov 22, 2013

spacepod commented Nov 27, 2013

spacepod commented Nov 27, 2013