This repository contains everything necessary to process image data, from focus stacking to basic analysis.
Future revisions will include .zarr support (more compression options, faster read/writes).
requirements_fstack.txt
: List of requirements to run fstack and fstack_cu. Activate your conda environment and runpip install -r requirements_fstack.txt
- fstack
__init__.py
: this is the fstack code.- Python code in the pipeline/ directory can import fstack with
from fstack import fstack
- fstack_cu
__init__.py
: this is the CUDA-accelerated fstack code.- Python code in the pipeline/ directory can import fstack with
from fstack_cu import fstack_cu
orfrom fstack_cu import fstack_cu_images
fstacker.py
: perform a focus stack using local (provide path) or remote (using GCSFS) images. This code assumes the existence of anindex.csv
file in the source directory containing cycle names.
dpc_overlay.py
: take left-illuminated and right-illuminated images and combine them to improve contrast
random_image_segments.py
: pick a channel and get a random selection of images from that channel across all cycles. This is useful for generating training data if you have to train your own cellpose model.
requirements_segment.txt
: seerequirements_fstack.txt
descriptionsegmenter.py
: segment nuclei using cellpose using a pretrained model. Use the cellpose gui to create the model.
requirements_analyze.txt
: seerequirements_fstack.txt
descriptionanalyzer.py
: measure the size and average brightness of each cell in each channel and save as a csv.analyzer.py
assumes a folder and file structure created bysegmenter.py
.
reqeirements-deepzoom.txt
: seerequirements_fstack.txt
descriptiondeepzoom.py
: make a deepzoom image and optionally make a web viewer for it. If you are making a web viewer, you must manually copy the openseadragon folder to the same directory as the_all_viewer.html
document.- openseadragon: this folder contains the files for the deepzoom web viewer.
First, install miniconda for python 3 following this guide: https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html Miniconda will help manage our packages and python environment.
Next, install CUDA 11.3 and cuDNN 8.5.0 from nvidia. This lets us use the graphics card for accelerated ML and image processing. We need version 11.3 for compatibility with Cellpose and M2Unet.
CUDA: sudo apt-get update
, sudo apt-get upgrade
, sudo apt-get install cuda=11.3.1-1
, sudo apt-get install nvidia-gds=11.4.1-1
, export PATH="/usr/local/cuda-11.3/bin:$PATH"
, export LD_LIBRARY_PATH="/usr/local/cuda-11.3/lib64 $LD_LIBRARY_PATH"
, sudo reboot
. Verify that the PATH exported properly, if it didn't, modify ~./bashrc
to add CUDA to PATH and LD_LIBRARY_PATH.
cuDNN: Follow the directions for Ubuntu network installation here: https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html#package-manager-ubuntu-install Make sure you install a version of cuDNN compatible with CUDA 11.3 (e.g.libcudnn8=8.2.1.32-1+cuda11.3
and libcudnn8-dev=8.2.1.32-1+cuda11.3
)
Create a new conda environment and install the requirements: conda create --name pipeline
, conda activate pipeline
, then follow this guide to install the correct version of pytorch: https://pytorch.org/get-started/locally/. Next, run the following commands:pip install -r requirements_fstack.txt
, pip install -r requirements_segment.txt
, pip install -r requirements_analyze.txt
, pip install -r requirements_deepzoom.txt
Focus-stack every imageset from a codex experiment. Suppose you have the following file structure:
src
source directory (can be any valid string)exp_id_1
experiment ID (can be any string)index.csv
CSV with the valid cycle names (must be named "index.csv")cycle1
folder with name of fisrt cycle (can be any string as long as it matches index.csv)0
folder named "0" (must be named "0")0_0_0_Fluorescence_405_nm_Ex.bmp
bmp image with this name format. The first digit represents the i coordinate, the second digit represents the j coordinate, the third represents the z height, and the rest of the filename represnts the channel wavelength0_0_0_Fluorescence_488_nm_Ex.bmp
bmp with same coordinates as above but with a different channel- more BMPs
6_7_5_Fluorescence_488_nm_Ex.bmp
for our example, suppose i ranges from 0 to 6, j ranges from 0 to 7, and z index ranges from 0 to 5
exp_id_2
another experiment (can have any number)0
- identical structure to
exp_id_1
- identical structure to
For each experiment ID, for each channel Fluorescence_NNN_nm_Ex
, and for each i index i
and j index j
in the range, fstacker.py
generates an image called "i
_j
_f_Fluorescence_NNN
_nm_Ex.png" image (with different values for i
, j
, and NNN
for each image stacked) and saves it to either the src
directory or a different directory of your choosing.
there are many parameters to set which images get focus stacked and where to save them. here's the complete list of the parameters and what they mean:
CLI
: set toTrue
orFalse
. Keep it set to false, the command line interface development is lagging behind the script development.use_gpu
: set toTrue
orFalse
. Focus stacking can be accelerated with the GPU. Depending on the dataset and computer, it might be faster to set it toFalse
and use the CPU; try running with either option and compare focus stack times.prefix
: string. If you have an index.csv, leave this string empty. If you don't have an index.csv with the names of the cycles to analyze, you can select which cycles to run by prefix. For example, if you have three cycles with names,cycle_good1
,cycle_also_good
, andcycle_bad
and you only want to run focus stacking on the two good datasets, you can setprefix="cy"
and renamecycle_bad
to_cycle_bad
so it will be excluded. Setprefix = '*'
to get all folders.key
: string. If you are connecting to a Google Cloud File Storage, set this to the local path to the authentication token .json. If you are running segmentation locally, this doesn't matter.gcs_project
: string. Set this to the Google Cloud Storage project name if you are connecting to GCS. Otherwise, it doesn't matter.src
: string. path to the folder that contains the experiment folder. Can be a GCS locator (e.g.gs://octopi-data
) or a local path (e.g./home/user/Documents/src/
). Note - must have a trailing slash (/
)dst
: string. path to folder to save data. If left blank, the images will be stored in the same directory assrc
.fstacker.py
will recreate the source folder structure in this folder. Also must have a trailing slash.exp
: list of strings. List of experiment IDs. In the example above, this would be["exp_id_1", "exp_id_2"]
cha
: list of strings. List of channel names. In the example above, this would be["Fluorescence_405_nm_Ex", "Fluorescence_488_nm_Ex"]
but it should be left as-is.typ
: string. Filetype of the images to read. Leave as-is.colors
: Dictionary of lists. Leave as-is.remove_background
: boolean. SetTrue
to run a white-tophat filter. This will increase runtime considerably.invert_contrast
: boolean. SetTrue
to invert the image.shift_registration
: boolean. SetTrue
to ensure images stay aligned across cycles.subtract_background
: boolean. SetTrue
to subtract the minimum brightess off from the entire image.use_color
: boolean. SetTrue
when processing RGB images (untested).imin
,imax
,jmin
, andjmax
: integers. The range of i and j values to process. Generally, they should be the actual values for the experimentkmin
andkmax
: integers. The range of z index values to image stack. If (kmax+1-kmin) is less than or equal to 4, focus stacking will not run and the midpoint k index will be used. Setkmin
=kmax
if there's only one valid z index to bypass focus stacking.crop_start
andcrop_end
: integers. Crop the image; x and y coordinates less than or greater thancrop_start
orcrop_end
respectively will be cropped out.WSize
: integer. Must be odd. Size of blurring kernel for focus stacking. 9 is a good value.alpha
: float. Must be between 0 and 1. Parameter for blending z stacks. 0.2 is a good value.sth
: integer. Parameter for blending z stacks. 13 is a good value.verbose
: boolean. SetTrue
to print extra details during operation (good for determining whether GPU stacking is faster than CPU)
- to use fstacker as a script, set
CLI = TRUE
inmain()
infstacker.py
and set the constants. Then, run the file - to use the command line interface, set
CLI = TRUE
inmain()
infstacker.py
. Navigate to the directoryfstacker.py
is in, activate the conda environment if necessary, and runpython -m fstacker --help
to see all the flags and how to set them
when imaging, contrast can be improved via differential phase contrast (DPC). This involves imaging cells with light coming from the left and again with light from the right and combining these images.
Overlaying different channels can be useful for data visualization. The fluorescence and flatfield channels are dimmed, overlaid, and rescaled to make full use of the 8 bit range of brightnesses
asdfasdfasdf
asdfasdf
cellpose uses machine learning models to segment the nuclei (405 nm channel). In my experience, the "cyto" built-in model detects the cells but does a bad job at isolating the nuclei while the "nuclei" model does a good job at isolating the nuclei but fails to detect all the nuclei in the image. In theory, training the cellpose model on just one view of the 405 nm channel should be sufficent but in practice this leads to overfitting. Using several different views of the 405 nm channel solves this problem. Manually segmenting full images for training is tedious and redundant and using smaller subsets of each view is sufficient. random_image_segmentation.py
randomly selects a set quantity of images, crops the image, and saves the image locally (no option for remote saving). The Cellpose GUI only works on local images so there's no point in saving them remotely. This script assumes the images are saved in the same folder structure as described in the fstacker theory of operation.
root_dir
: string. local or remote path to where the images are storeddest_dir
: string. local path to store the randomly selected and cropped images. this should be different fromroot_dir
to make it easier to work with them.exp_id
: string. experiment ID to get the images from. see fstacker theory of operation for more detail.channel
: string. image channel to randomly select from. see fstacker theory of operation for more detail.zstack
: string. if you ranfstacker.py
and want to use the focus-stacked images, set this value to"f"
. otherwise set it to the z stack you want to use (for example, if z=5 is in focus for all cells across all images, you can setzstack="5"
and those images will be utilized)key
: string. If you are connecting to a Google Cloud File Storage, set this to the local path to the authentication token .json.gcs_project
: string. Set this to the Google Cloud Storage project name if you are connecting to GCS. Otherwise, it doesn't matter.n_rand
: int. number of images to sample. 20 seems to be a good number.n_sub
: int. cut the images inton_sub
^2 equal-size sub-images and randomly select one to save. 3 seems to be a good choice but 4 probably will also work. Make sure the sub-images have a handfull of cells in them.
-
set the parameters and run the script. the images will be saved to
dest_dir + exp_id
directory. -
Deactivate your conda environment and install
cellpose[gui]
in the base environment. Use cellpose to manually segment the training data. -
To train cellpose, run the following command:
python -m cellpose --use_gpu --train --dir {dir-to-subsets-folder} --chan 0 --batch_size 4 --pretrained_model {pretrained-model-path-or-name} --verbose --mask_filter _seg.npy
where{dir-to-subsets-folder}
is the path to the folder with training images and masks and{pretrained-model-path-or-name}
is the name of a cellpose built-in odel (e.g.cyto
ornuclei
) or the path to an existing cellpose model. After the command is run, the model will be in{dir-to-subsets-folder}/models
.
cells don't move between cycles so we only need to segment one cycle for each (i,j) view. We generally choose to segment the nuclear channel because it is brightest. The script first loads all the channel paths, sorts them to find the 0th channel, then filters the image paths so only the 0th channel images are loaded. Alternatively, we can choose to segment all the images. Images are then segmented one at a time. In principle, Cellpose can work faster by segmenting multi-image batches but in my experience not all GPUs can handle segmenting multiple images. Cellpose then saves a series of .npy files with the masks to the destination directory.
if you are having trouble installing cellpose, try uninstalling all other python packages that use QTpy (e.g. cv2), install cellpose, then reinstall everthing you uninstalled. If you are using CUDA, ensure your CUDA version is compatible with your version of torch. Cellpose[gui] seems to only work in the base channel.
root_dir
: string. local or remote path to where the images are storedexp_id
: string. experiment ID to get the images from. see fstacker theory of operation for more detail.channel
: string. image channel to randomly select from. see fstacker theory of operation for more detail.zstack
: string. if you ranfstacker.py
and want to use the focus-stacked images, set this value to"f"
. otherwise set it to the z stack you want to use (for example, if z=5 is in focus for all cells across all images, you can setzstack="5"
and those images will be utilized)key
: string. If you are connecting to a Google Cloud File Storage, set this to the local path to the authentication token .json.gcs_project
: string. Set this to the Google Cloud Storage project name if you are connecting to GCS. Otherwise, it doesn't matter.cpmodel
: string. path to cellpose model. Can be remote or local.use_gpu
: boolean. Set toTrue
to try segmenting using the GPU.segment_all
: boolean. Set toTrue
to segment all cycles, not just the firstchannels
: list of ints. Which channel to run segmentation on. Set to[0,0]
for the monochrome channel
- to use segmenter as a script, set the constants and run the file. Note that it takes a pretrained model as a parameter; you can change it to one of the built-in pretrained models or set a path to a custom pretrained model
- cellpose already has a CLI;
segmenter.py
itself does not have a CLI
We assume cell positions don't change between cycles and we can mask the cells by expanding the nucleus masks from segmenter.py
. We have a mask for each (i,j) view; for each view we load all the images across all cycles at that view and load the nucleus mask. The mask is expanded to mask the entire cell. Then for each cell in the view, for each channel, for each cycle we calculate the average brightness of the cell and store it in a csv.
The CSV columns are cell index in a given view, the i index, j index, x position in the image, y position in the image, the number of pixels in the expanded mask, and the number of pixels in the nuclear mask. Then, there is a column for each element in the cartesian product of channel index and cycle index. The header is re-printed in the csv and the cell index resets for each view.
cy_name
: string. If during semgentationsegment_all = True
, set this to be the name of the cycle you want to use as the mask. Ifsegment_all = False
, setcy_name = "first"
.start_idx
,end_idx
: integers. Select a range of cycles to analyzen_ch
: integer. Number of channels to analyzeexpansion
: integer, must be odd. The number of pixels to expand around the nucleus mask to create the cell masks.root_dir
: string. local or remote path to where the images are storedexp_id
: string. experiment ID to get the images from. see fstacker theory of operation for more detail.channel
: string. image channel to randomly select from. see fstacker theory of operation for more detail.zstack
: string. if you ranfstacker.py
and want to use the focus-stacked images, set this value to"f"
. otherwise set it to the z stack you want to use (for example, if z=5 is in focus for all cells across all images, you can setzstack="5"
and those images will be utilized)key
: string. If you are connecting to a Google Cloud File Storage, set this to the local path to the authentication token .json.gcs_project
: string. Set this to the Google Cloud Storage project name if you are connecting to GCS. Otherwise, it doesn't matter.mask_union
: boolean. Set to true to save a .npy pickle for each view with the union of all nuclear and cell masks.out
: string. Path to store the csv. Local or remote path.
- to use analyzer as a script, set the constants and run the file
- there currently is no CLI version
It is useful to have a display showing the entire view for each channel. deepzoom.py
takes the experiment ID and list of cycles as arguments and makes a deepzoom for each channel as an output. There is some overlap between adjacent views and deepzoom.py
naively crops the image a fixed amount to remove the overlap. It also can generate an html file to make a web viewer for the deepzoom image.
parameters['crop_x0']
: int, how much to crop off along the x axis. Should be roughtly 1/30 the length of the x dimensionparameters['crop_x1']
: int, where to stop cropping. Should be roughly xmax * (29/30)parameters['crop_y0']
: int, same ascrop_x0
but for the y axisparameters['crop_y1']
: int, same ascrop_x1
but for the y axismake_viewer
: boolean, setTrue
to generate an html file for viewingkey
: string, If you are connecting to a Google Cloud File Storage, set this to the local path to the authentication token .json.gcs_project
: string. Set this to the Google Cloud Storage project name if you are connecting to GCS. Otherwise, it doesn't matter.src
: string. local or remote path to where the images are storeddst
: string. local or remote path to where to save the deepzoom image and html.exp_id
: string. experiment ID to get the images from. see fstacker theory of operation for more detail.cha
: list of strings. seefstacker.py
for explanation.cy
: list of ints. which cycles to make deepzooms ofzstack
: string. if you ranfstacker.py
and want to use the focus-stacked images, set this value to"f"
. otherwise set it to the z stack you want to use (for example, if z=5 is in focus for all cells across all images, you can setzstack="5"
and those images will be utilized)
conda install -c conda-forge librsvg conda install -c conda-forge libiconv conda install --channel conda-forge pyvips
-
to use deepzoom generator as a script, set the constants and run the file
-
navigate to the directory with the .html file, copy-paste the openseadragon folder to this directory, and run
python3 -m http.server
in this directory