Skip to content

Commit c92231c

Browse files
josenavasantgonza
authored andcommitted
Analysis refactor db (#2040)
* New DB structure * Adding python patch * Adding a biom so we can actually execute the patch * Fixing the patch to correctly transfer the information from the old structure to the new one * Fixing patch * Fixing patch and a few other bits to make the patch run successfully * These files are no longer needed * Droping analysis status table * Linking the analysis with all the artifacts * Fixing typo * Fixing HTML and dbschema files * Adding analyisis jobs * Adding logging column to the analysis * Addressing @antgonza's comments * Taking into account non-phylogenetic metrics in beta diversity
1 parent 80c5fea commit c92231c

File tree

10 files changed

+3952
-4631
lines changed

10 files changed

+3952
-4631
lines changed

qiita_db/environment_manager.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -405,4 +405,4 @@ def patch(patches_dir=PATCHES_DIR, verbose=False, test=False):
405405
if verbose:
406406
print('\t\tApplying python patch %s...'
407407
% py_patch_filename)
408-
execfile(py_patch_fp)
408+
execfile(py_patch_fp, {})

qiita_db/support_files/patches/47.sql

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
-- Jan 5, 2017
2+
-- Move the analysis to the plugin system. This is a major rewrite of the
3+
-- database backend that supports the analysis pipeline.
4+
-- After exploring the data on the database, we realized that
5+
-- there are a lot of inconsistencies in the data. Unfortunately, this
6+
-- makes the process of transferring the data from the old structure
7+
-- to the new one a bit more challenging, as we will need to handle
8+
-- different special cases. Furthermore, all the information needed is not
9+
-- present in the database, since it requires checking BIOM files. Due to these
10+
-- reason, the vast majority of the data transfer is done in the python patch
11+
-- 47.py
12+
13+
-- In this file we are just creating the new data structures. The old
14+
-- datastructure will be dropped in the python patch once all data has been
15+
-- transferred.
16+
17+
-- Create the new data structures
18+
19+
-- Table that links the analysis with the initial set of artifacts
20+
CREATE TABLE qiita.analysis_artifact (
21+
analysis_id bigint NOT NULL,
22+
artifact_id bigint NOT NULL,
23+
CONSTRAINT idx_analysis_artifact_0 PRIMARY KEY (analysis_id, artifact_id)
24+
);
25+
CREATE INDEX idx_analysis_artifact_analysis ON qiita.analysis_artifact (analysis_id);
26+
CREATE INDEX idx_analysis_artifact_artifact ON qiita.analysis_artifact (artifact_id);
27+
ALTER TABLE qiita.analysis_artifact ADD CONSTRAINT fk_analysis_artifact_analysis FOREIGN KEY ( analysis_id ) REFERENCES qiita.analysis( analysis_id );
28+
ALTER TABLE qiita.analysis_artifact ADD CONSTRAINT fk_analysis_artifact_artifact FOREIGN KEY ( artifact_id ) REFERENCES qiita.artifact( artifact_id );
29+
30+
-- Droping the analysis status column cause now it depends on the artifacts
31+
-- status, like the study does.
32+
ALTER TABLE qiita.analysis DROP COLUMN analysis_status_id;
33+
34+
-- Create a table to link the analysis with the jobs that create the initial
35+
-- artifacts
36+
CREATE TABLE qiita.analysis_processing_job (
37+
analysis_id bigint NOT NULL,
38+
processing_job_id uuid NOT NULL,
39+
CONSTRAINT idx_analysis_processing_job PRIMARY KEY ( analysis_id, processing_job_id )
40+
) ;
41+
42+
CREATE INDEX idx_analysis_processing_job_analysis ON qiita.analysis_processing_job ( analysis_id ) ;
43+
CREATE INDEX idx_analysis_processing_job_pj ON qiita.analysis_processing_job ( processing_job_id ) ;
44+
ALTER TABLE qiita.analysis_processing_job ADD CONSTRAINT fk_analysis_processing_job FOREIGN KEY ( analysis_id ) REFERENCES qiita.analysis( analysis_id ) ;
45+
ALTER TABLE qiita.analysis_processing_job ADD CONSTRAINT fk_analysis_processing_job_pj FOREIGN KEY ( processing_job_id ) REFERENCES qiita.processing_job( processing_job_id ) ;
46+
47+
-- Add a logging column in the analysis
48+
ALTER TABLE qiita.analysis ADD logging_id bigint ;
49+
CREATE INDEX idx_analysis_0 ON qiita.analysis ( logging_id ) ;
50+
ALTER TABLE qiita.analysis ADD CONSTRAINT fk_analysis_logging FOREIGN KEY ( logging_id ) REFERENCES qiita.logging( logging_id ) ;
51+
52+
-- We can handle some of the special cases here, so we simplify the work in the
53+
-- python patch
54+
55+
-- Special case 1: there are jobs in the database that do not contain
56+
-- any information about the options used to process those parameters.
57+
-- However, these jobs do not have any results and all are marked either
58+
-- as queued or error, although no error log has been saved. Since these
59+
-- jobs are mainly useleess, we are going to remove them from the system
60+
DELETE FROM qiita.analysis_job
61+
WHERE job_id IN (SELECT job_id FROM qiita.job WHERE options = '{}');
62+
DELETE FROM qiita.job WHERE options = '{}';
63+
64+
-- Special case 2: there are a fair amount of jobs (719 last time I
65+
-- checked) that are not attached to any analysis. Not sure how this
66+
-- can happen, but these orphan jobs can't be accessed from anywhere
67+
-- in the interface. Remove them from the system. Note that we are
68+
-- unlinking the files but we are not removing them from the filepath
69+
-- table. We will do that on the patch 47.py using the
70+
-- purge_filepaths function, as it will make sure that those files are
71+
-- not used anywhere else
72+
DELETE FROM qiita.job_results_filepath WHERE job_id IN (
73+
SELECT job_id FROM qiita.job J WHERE NOT EXISTS (
74+
SELECT * FROM qiita.analysis_job AJ WHERE J.job_id = AJ.job_id));
75+
DELETE FROM qiita.job J WHERE NOT EXISTS (
76+
SELECT * FROM qiita.analysis_job AJ WHERE J.job_id = AJ.job_id);

0 commit comments

Comments
 (0)