Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

preprocessing #1

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

MikeLippincott
Copy link
Member

This PR preprocesses the data and files to be in the right directories.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Member

@jenna-tomkinson jenna-tomkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🎉 I added a few comments, questions, and suggestions for you to address. Nice job!

# In[2]:


# absolute path to the raw data directory only works on this machine
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# absolute path to the raw data directory only works on this machine
# absolute path to the raw data directory (only works on this machine)

Comment on lines +65 to +81
# make a df out of the file names
df = pd.DataFrame(list_of_new_files, columns=["file_path"])
df.insert(0, "file_name", df["file_path"].apply(lambda x: pathlib.Path(x).name))
df.insert(0, "Plate", df["file_path"].apply(lambda x: x.split("/")[7]))
df.insert(0, "Well", df["file_name"].apply(lambda x: x.split("F")[0].split("W")[-1]))
df.insert(0, "FOV", df["file_name"].apply(lambda x: x.split("T")[0].split("F")[-1]))
df.drop("file_path", axis=1, inplace=True)
df.drop("file_name", axis=1, inplace=True)
# split the plate into time and date
df.insert(2, "Date_Time", df["Plate"].apply(lambda x: x.strip("_").replace("T", "")))
# format the time into YYYY-MM-DD HH:MM:SS
df["Date_Time"] = pd.to_datetime(df["Date_Time"], format="%Y%m%d%H%M%S")

# sort by Date, Time, Plate, Well, FOV
df.sort_values(by=["Date_Time", "Plate", "Well", "FOV"], inplace=True)
df.reset_index(drop=True, inplace=True)
df.head()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this to generate a platemap file or what is the goal? I see below that this looks to become a JSON file, recommend a code comment for defining what this is doing and why you need it.

Comment on lines +87 to +88
# well dictionary for mapping
well_map = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend trying something more dynamic here to reduce the lines of code. Here is something that could work; feel free to try or ignore :)

# Generate the dictionary dynamically
well_map = {
    f"{i:04d}": f"{row}{col:02d}"
    for i, (row, col) in enumerate(
        ((r, c) for r in string.ascii_uppercase[:12] for c in range(1, 25)), start=1
    )
}

Comment on lines +474 to +481
# write the well map to a json file
path_to_repo_data = pathlib.Path("../../../data/processed/").resolve()
path_to_repo_data.mkdir(exist_ok=True, parents=True)
with open(path_to_repo_data / "well_map.json", "w") as f:
json.dump(well_map, f)
# map the well to the well_map
df["Well"] = df["Well"].map(well_map)
df.head()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as comment above, is this json file meant to be the platemap file or what makes this different?

Comment on lines +498 to +499
# check that there are
# 5 fovs * 5 channels * 96 wells = 2400 images per plate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this be consistent for this project or is there a chance that this will change? Recommend making this more dynamic if possible.

# In[2]:


# absolute path to the raw data directory only works on this machine
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend changing this comment since it looks like it doesn't apply here (these paths below are relative to the current directory).

# In[3]:


dict_platemap = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This dictionary seems short compared to the other dictionary made for the JSON file in the last notebook. Are not all wells used?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend code comment here to provide any clarity or context.

Comment on lines +128 to +130
platemap_df["treatment"] = platemap_df["treatment"].str.replace(
"CTL (complete medium only)", "Media"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why include this replacement here when you could change it in the dictionary above?

platemap_df[["treatment1", "treatment2"]] = platemap_df["treatment"].str.split(
" \+ ", expand=True
)
# platemap_df[['treatment1', 'treatment2']] = platemap_df["treatment"].str.split("+",n=1, expand=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend removing commented-out code if not used/needed anymore.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend removing this duplicate script since it is not the correct notebook name (missing 0. prefix).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants