This page is for documenting the workflow that I've edited to automatize the data analysis process for crystallography in the Drug Discovery Unit (University of Dundee).
This workflow has been initiated by Paul Fyfe. My role was to improve the structure and make it run on schedule.
The initial version of the script was developed by Paul Fyfe. In this first version, the script was launched inside a folder where images where collected from another machine and transfered over to the one where the analysis is performed.
The script looks if there are 50 or more images and if the MSCServDetCCD.log is present.
If this is the case, XIA2 runs with or without the presence of a .mtz
file.
After this step, if a .pdb
file of the protein is available in the working directory, DIMPLE runs and the log is saved into a log file. This last step wasn't tested in the first version.
- Create an adeguate file system for all the projects where to store unproccessed and processed data.
- Reinforce file checks and add all the different alternatives.
- Test and run DIMPLE after the XIA2 run.
- Schedule the analysis to run twice a day.
The proposed file system is this:
~/data
|---------/project_folder
|----/user1
|----/unprocessed_data
|----------------/job1
|---/run01
|---/run02
...
|---/runN
|----------------/job2
|---/run01
|---/run02
...
|---/runN
...
|----------------/jobN
|---/run01
|---/run02
...
|---/runN
|----/processed_data
...
|----/userN
|----/unprocessed_data
|----------------/job1
|---/run01
|---/run02
...
|---/runN
|----------------/job2
|---/run01
|---/run02
...
|---/runN
...
|----------------/jobN
|---/run01
|---/run02
...
|---/runN
|----/processed_data
A working_folder
is placed inside the jobN
folder. This contains the images colected from the previous machine.
A .mtz
file and a .pdb
file can be stored at the job-level folder. This is because each job can use the same crystal structure under different experiemntal conditions.
At the end of the run, the content of /jobN/runM
is moved from the unprocessed_data
folder to the processed_data
one, where users can visualize and work on it.
Two CRON jobs are set up to run at 12 am and 8 pm every day.