New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

preprocessing #1

Open

MikeLippincott wants to merge 1 commit into WayScience:main from MikeLippincott:preprocessing

Member

MikeLippincott commented Nov 15, 2024

This PR preprocesses the data and files to be in the right directories.


          preprocessing

5edae0e

review-notebook-app bot commented Nov 15, 2024

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

jenna-tomkinson approved these changes

View reviewed changes

Member

jenna-tomkinson left a comment

LGTM! 🎉 I added a few comments, questions, and suggestions for you to address. Nice job!

Wave1_data/1.preprocess_data/scripts/0.preprocess_raw_data.py

		# In[2]:


		# absolute path to the raw data directory only works on this machine

Member

jenna-tomkinson Dec 9, 2024

Suggested change

      
            # absolute path to the raw data directory only works on this machine
          
            # absolute path to the raw data directory (only works on this machine)

Wave1_data/1.preprocess_data/scripts/0.preprocess_raw_data.py

Comment on lines +65 to +81

+              # make a df out of the file names
+              df = pd.DataFrame(list_of_new_files, columns=["file_path"])
+              df.insert(0, "file_name", df["file_path"].apply(lambda x: pathlib.Path(x).name))
+              df.insert(0, "Plate", df["file_path"].apply(lambda x: x.split("/")[7]))
+              df.insert(0, "Well", df["file_name"].apply(lambda x: x.split("F")[0].split("W")[-1]))
+              df.insert(0, "FOV", df["file_name"].apply(lambda x: x.split("T")[0].split("F")[-1]))
+              df.drop("file_path", axis=1, inplace=True)
+              df.drop("file_name", axis=1, inplace=True)
+              # split the plate into time and date
+              df.insert(2, "Date_Time", df["Plate"].apply(lambda x: x.strip("_").replace("T", "")))
+              # format the time into YYYY-MM-DD HH:MM:SS
+              df["Date_Time"] = pd.to_datetime(df["Date_Time"], format="%Y%m%d%H%M%S")
+              # sort by Date, Time, Plate, Well, FOV
+              df.sort_values(by=["Date_Time", "Plate", "Well", "FOV"], inplace=True)
+              df.reset_index(drop=True, inplace=True)
+              df.head()

Member

jenna-tomkinson Dec 9, 2024

Is this to generate a platemap file or what is the goal? I see below that this looks to become a JSON file, recommend a code comment for defining what this is doing and why you need it.

Wave1_data/1.preprocess_data/scripts/0.preprocess_raw_data.py

Comment on lines +87 to +88

		# well dictionary for mapping
		well_map = {

Member

jenna-tomkinson Dec 9, 2024

Recommend trying something more dynamic here to reduce the lines of code. Here is something that could work; feel free to try or ignore :)

# Generate the dictionary dynamically
well_map = {
    f"{i:04d}": f"{row}{col:02d}"
    for i, (row, col) in enumerate(
        ((r, c) for r in string.ascii_uppercase[:12] for c in range(1, 25)), start=1
    )
}

Wave1_data/1.preprocess_data/scripts/0.preprocess_raw_data.py

Comment on lines +474 to +481

+              # write the well map to a json file
+              path_to_repo_data = pathlib.Path("../../../data/processed/").resolve()
+              path_to_repo_data.mkdir(exist_ok=True, parents=True)
+              with open(path_to_repo_data / "well_map.json", "w") as f:
+                  json.dump(well_map, f)
+              # map the well to the well_map
+              df["Well"] = df["Well"].map(well_map)
+              df.head()

Member

jenna-tomkinson Dec 9, 2024

Same as comment above, is this json file meant to be the platemap file or what makes this different?

Wave1_data/1.preprocess_data/scripts/0.preprocess_raw_data.py

Comment on lines +498 to +499

		# check that there are
		# 5 fovs * 5 channels * 96 wells = 2400 images per plate

Member

jenna-tomkinson Dec 9, 2024

Will this be consistent for this project or is there a chance that this will change? Recommend making this more dynamic if possible.

Wave1_data/1.preprocess_data/scripts/1.define_platemap.py

		# In[2]:


		# absolute path to the raw data directory only works on this machine

Member

jenna-tomkinson Dec 9, 2024

Recommend changing this comment since it looks like it doesn't apply here (these paths below are relative to the current directory).

Wave1_data/1.preprocess_data/scripts/1.define_platemap.py

		# In[3]:


		dict_platemap = {

Member

jenna-tomkinson Dec 9, 2024

This dictionary seems short compared to the other dictionary made for the JSON file in the last notebook. Are not all wells used?

Member

jenna-tomkinson Dec 9, 2024

Recommend code comment here to provide any clarity or context.

Wave1_data/1.preprocess_data/scripts/1.define_platemap.py

Comment on lines +128 to +130

+              platemap_df["treatment"] = platemap_df["treatment"].str.replace(
+                  "CTL (complete medium only)", "Media"
+              )

Member

jenna-tomkinson Dec 9, 2024

Why include this replacement here when you could change it in the dictionary above?

Wave1_data/1.preprocess_data/scripts/1.define_platemap.py

+              platemap_df[["treatment1", "treatment2"]] = platemap_df["treatment"].str.split(
+                  " \+ ", expand=True
+              )
+              # platemap_df[['treatment1', 'treatment2']] = platemap_df["treatment"].str.split("+",n=1, expand=True)

Member

jenna-tomkinson Dec 9, 2024

Recommend removing commented-out code if not used/needed anymore.

Wave1_data/1.preprocess_data/scripts/preprocess_raw_data.py

Member

jenna-tomkinson Dec 9, 2024

Recommend removing this duplicate script since it is not the correct notebook name (missing 0. prefix).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet