Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update pycytominer pipelines #23

Merged
merged 10 commits into from
Aug 30, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,6 @@ Corrected_Images

# ignore Mac related files
.DS_Store

# ignore jupyter files
.ipynb_checkpoints/
121 changes: 73 additions & 48 deletions 3.processing_features/0.merge_sc_cytotable.ipynb
Original file line number Diff line number Diff line change
@@ -1,15 +1,13 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Merge single cells from CellProfiler outputs using CytoTable"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
Expand All @@ -36,7 +34,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
Expand All @@ -47,7 +44,18 @@
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"outputs": [
{
"data": {
"text/plain": [
"['Plate_3_prime', 'Plate_1', 'Plate_4', 'Plate_3', 'Plate_2']"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# type of file output from CytoTable (currently only parquet)\n",
"dest_datatype = \"parquet\"\n",
Expand All @@ -61,11 +69,23 @@
"\n",
"# list for plate names based on folders to use to create dictionary\n",
"plate_names = []\n",
"# iterate through 0.download_data and append plate names from folder names that contain image data from that plate\n",
"\n",
"# iterate through 0.download_data and append plate names from folder names\n",
"# that contain image data from that plate\n",
"# (Note, you must first run `0.download_data/download_plates.ipynb`)\n",
"for file_path in pathlib.Path(\"../0.download_data/\").iterdir():\n",
" if str(file_path.stem).startswith(\"Plate\"):\n",
" plate_names.append(str(file_path.stem))\n",
"\n",
" \n",
"plate_names"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"# preset configurations based on typical CellProfiler outputs\n",
"preset = \"cellprofiler_sqlite_pycytominer\"\n",
"# remove Image_Metadata_Plate from SELECT as this metadata was not extracted from file names\n",
Expand Down Expand Up @@ -110,23 +130,23 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{ 'Plate_1': { 'dest_path': 'data/converted_data/Plate_1.parquet',\n",
" 'source_path': '/home/jenna/nf1_cellpainting_data/2.cellprofiler_analysis/analysis_output/Plate_1/Plate_1_nf1_analysis.sqlite'},\n",
" 'source_path': '/home/gway/repos/nf1_cellpainting_data/2.cellprofiler_analysis/analysis_output/Plate_1/Plate_1_nf1_analysis.sqlite'},\n",
" 'Plate_2': { 'dest_path': 'data/converted_data/Plate_2.parquet',\n",
" 'source_path': '/home/jenna/nf1_cellpainting_data/2.cellprofiler_analysis/analysis_output/Plate_2/Plate_2_nf1_analysis.sqlite'},\n",
" 'source_path': '/home/gway/repos/nf1_cellpainting_data/2.cellprofiler_analysis/analysis_output/Plate_2/Plate_2_nf1_analysis.sqlite'},\n",
" 'Plate_3': { 'dest_path': 'data/converted_data/Plate_3.parquet',\n",
" 'source_path': '/home/jenna/nf1_cellpainting_data/2.cellprofiler_analysis/analysis_output/Plate_3/Plate_3_nf1_analysis.sqlite'},\n",
" 'source_path': '/home/gway/repos/nf1_cellpainting_data/2.cellprofiler_analysis/analysis_output/Plate_3/Plate_3_nf1_analysis.sqlite'},\n",
" 'Plate_3_prime': { 'dest_path': 'data/converted_data/Plate_3_prime.parquet',\n",
" 'source_path': '/home/jenna/nf1_cellpainting_data/2.cellprofiler_analysis/analysis_output/Plate_3_prime/Plate_3_prime_nf1_analysis.sqlite'},\n",
" 'source_path': '/home/gway/repos/nf1_cellpainting_data/2.cellprofiler_analysis/analysis_output/Plate_3_prime/Plate_3_prime_nf1_analysis.sqlite'},\n",
" 'Plate_4': { 'dest_path': 'data/converted_data/Plate_4.parquet',\n",
" 'source_path': '/home/jenna/nf1_cellpainting_data/2.cellprofiler_analysis/analysis_output/Plate_4/Plate_4_nf1_analysis.sqlite'}}\n"
" 'source_path': '/home/gway/repos/nf1_cellpainting_data/2.cellprofiler_analysis/analysis_output/Plate_4/Plate_4_nf1_analysis.sqlite'}}\n"
]
}
],
Expand All @@ -147,7 +167,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
Expand All @@ -156,28 +175,28 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Performing merge single cells and conversion on Plate_4!\n",
"Merged and converted Plate_4.parquet!\n",
"Added single cell count as metadata to Plate_4.parquet!\n",
"Performing merge single cells and conversion on Plate_3_prime!\n",
"Merged and converted Plate_3_prime.parquet!\n",
"Added single cell count as metadata to Plate_3_prime.parquet!\n",
"Performing merge single cells and conversion on Plate_1!\n",
"Merged and converted Plate_1.parquet!\n",
"Added single cell count as metadata to Plate_1.parquet!\n",
"Performing merge single cells and conversion on Plate_4!\n",
"Merged and converted Plate_4.parquet!\n",
"Added single cell count as metadata to Plate_4.parquet!\n",
"Performing merge single cells and conversion on Plate_3!\n",
"Merged and converted Plate_3.parquet!\n",
"Added single cell count as metadata to Plate_3.parquet!\n",
"Performing merge single cells and conversion on Plate_2!\n",
"Merged and converted Plate_2.parquet!\n",
"Added single cell count as metadata to Plate_2.parquet!\n",
"Performing merge single cells and conversion on Plate_3_prime!\n",
"Merged and converted Plate_3_prime.parquet!\n",
"Added single cell count as metadata to Plate_3_prime.parquet!\n"
"Added single cell count as metadata to Plate_2.parquet!\n"
]
}
],
Expand All @@ -186,6 +205,7 @@
"for plate, info in plate_info_dictionary.items():\n",
" source_path = info[\"source_path\"]\n",
" dest_path = info[\"dest_path\"]\n",
" \n",
" print(f\"Performing merge single cells and conversion on {plate}!\")\n",
"\n",
" # merge single cells and output as parquet file\n",
Expand All @@ -201,12 +221,20 @@
" sc_utils.add_sc_count_metadata_file(\n",
" data_path=dest_path, well_column_name=\"Image_Metadata_Well\", file_type=\"parquet\"\n",
" )\n",
" \n",
" print(f\"Added single cell count as metadata to {pathlib.Path(dest_path).name}!\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Check if converted data looks correct"
]
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 6,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -387,64 +415,64 @@
"</div>"
],
"text/plain": [
" Metadata_ImageNumber Image_Metadata_Plate Metadata_number_of_singlecells \n",
"0 1 Plate_4 81 \\\n",
" Metadata_ImageNumber Image_Metadata_Plate Metadata_number_of_singlecells \\\n",
"0 1 Plate_4 81 \n",
"1 1 Plate_4 81 \n",
"2 2 Plate_4 81 \n",
"3 2 Plate_4 81 \n",
"4 2 Plate_4 81 \n",
"\n",
" Image_Metadata_Site Image_Metadata_Well \n",
"0 10 B10 \\\n",
" Image_Metadata_Site Image_Metadata_Well \\\n",
"0 10 B10 \n",
"1 10 B10 \n",
"2 11 B10 \n",
"3 11 B10 \n",
"4 11 B10 \n",
"\n",
" Metadata_Cells_Number_Object_Number Metadata_Cytoplasm_Parent_Cells \n",
"0 4 4 \\\n",
" Metadata_Cells_Number_Object_Number Metadata_Cytoplasm_Parent_Cells \\\n",
"0 4 4 \n",
"1 5 5 \n",
"2 1 1 \n",
"3 2 2 \n",
"4 3 3 \n",
"\n",
" Metadata_Cytoplasm_Parent_Nuclei Metadata_Nuclei_Number_Object_Number \n",
"0 5 5 \\\n",
" Metadata_Cytoplasm_Parent_Nuclei Metadata_Nuclei_Number_Object_Number \\\n",
"0 5 5 \n",
"1 6 6 \n",
"2 1 1 \n",
"3 2 2 \n",
"4 3 3 \n",
"\n",
" Cytoplasm_AreaShape_Area ... Nuclei_Texture_Variance_DAPI_3_02_256 \n",
"0 22157.0 ... 1281.874186 \\\n",
" Cytoplasm_AreaShape_Area ... Nuclei_Texture_Variance_DAPI_3_02_256 \\\n",
"0 22157.0 ... 1281.874186 \n",
"1 11718.0 ... 1085.750460 \n",
"2 17501.0 ... 1273.428721 \n",
"3 17871.0 ... 633.124457 \n",
"4 12098.0 ... 894.732816 \n",
"\n",
" Nuclei_Texture_Variance_DAPI_3_03_256 \n",
"0 1257.435761 \\\n",
" Nuclei_Texture_Variance_DAPI_3_03_256 \\\n",
"0 1257.435761 \n",
"1 1113.144205 \n",
"2 1246.970723 \n",
"3 642.170387 \n",
"4 829.273862 \n",
"\n",
" Nuclei_Texture_Variance_GFP_3_00_256 Nuclei_Texture_Variance_GFP_3_01_256 \n",
"0 65.965695 52.068222 \\\n",
" Nuclei_Texture_Variance_GFP_3_00_256 Nuclei_Texture_Variance_GFP_3_01_256 \\\n",
"0 65.965695 52.068222 \n",
"1 139.037112 140.802921 \n",
"2 137.466776 111.514400 \n",
"3 190.690537 173.126428 \n",
"4 142.997128 131.232052 \n",
"\n",
" Nuclei_Texture_Variance_GFP_3_02_256 Nuclei_Texture_Variance_GFP_3_03_256 \n",
"0 50.445780 51.851812 \\\n",
" Nuclei_Texture_Variance_GFP_3_02_256 Nuclei_Texture_Variance_GFP_3_03_256 \\\n",
"0 50.445780 51.851812 \n",
"1 141.819546 149.091779 \n",
"2 113.076080 118.810204 \n",
"3 170.503677 178.200219 \n",
"4 126.981214 128.412295 \n",
"\n",
" Nuclei_Texture_Variance_RFP_3_00_256 Nuclei_Texture_Variance_RFP_3_01_256 \n",
"0 425.319446 409.351012 \\\n",
" Nuclei_Texture_Variance_RFP_3_00_256 Nuclei_Texture_Variance_RFP_3_01_256 \\\n",
"0 425.319446 409.351012 \n",
"1 512.879573 499.756267 \n",
"2 311.232220 306.768555 \n",
"3 401.039364 412.623493 \n",
Expand All @@ -460,21 +488,19 @@
"[5 rows x 2313 columns]"
]
},
"execution_count": 5,
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"converted_df = pd.read_parquet(plate_info_dictionary[\"Plate_4\"][\"dest_path\"])\n",
"\n",
"# load in and print a converted df to see if it looks correct\n",
"print(converted_df.shape)\n",
"converted_df.head()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
Expand All @@ -483,7 +509,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -495,7 +521,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "nf1_cellpainting_data",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
Expand All @@ -510,9 +536,8 @@
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.15"
},
"orig_nbformat": 4
}
},
"nbformat": 4,
"nbformat_minor": 2
"nbformat_minor": 4
}
Loading