-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
preprocessing #1
base: main
Are you sure you want to change the base?
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 🎉 I added a few comments, questions, and suggestions for you to address. Nice job!
# In[2]: | ||
|
||
|
||
# absolute path to the raw data directory only works on this machine |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# absolute path to the raw data directory only works on this machine | |
# absolute path to the raw data directory (only works on this machine) |
# make a df out of the file names | ||
df = pd.DataFrame(list_of_new_files, columns=["file_path"]) | ||
df.insert(0, "file_name", df["file_path"].apply(lambda x: pathlib.Path(x).name)) | ||
df.insert(0, "Plate", df["file_path"].apply(lambda x: x.split("/")[7])) | ||
df.insert(0, "Well", df["file_name"].apply(lambda x: x.split("F")[0].split("W")[-1])) | ||
df.insert(0, "FOV", df["file_name"].apply(lambda x: x.split("T")[0].split("F")[-1])) | ||
df.drop("file_path", axis=1, inplace=True) | ||
df.drop("file_name", axis=1, inplace=True) | ||
# split the plate into time and date | ||
df.insert(2, "Date_Time", df["Plate"].apply(lambda x: x.strip("_").replace("T", ""))) | ||
# format the time into YYYY-MM-DD HH:MM:SS | ||
df["Date_Time"] = pd.to_datetime(df["Date_Time"], format="%Y%m%d%H%M%S") | ||
|
||
# sort by Date, Time, Plate, Well, FOV | ||
df.sort_values(by=["Date_Time", "Plate", "Well", "FOV"], inplace=True) | ||
df.reset_index(drop=True, inplace=True) | ||
df.head() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this to generate a platemap file or what is the goal? I see below that this looks to become a JSON file, recommend a code comment for defining what this is doing and why you need it.
# well dictionary for mapping | ||
well_map = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recommend trying something more dynamic here to reduce the lines of code. Here is something that could work; feel free to try or ignore :)
# Generate the dictionary dynamically
well_map = {
f"{i:04d}": f"{row}{col:02d}"
for i, (row, col) in enumerate(
((r, c) for r in string.ascii_uppercase[:12] for c in range(1, 25)), start=1
)
}
# write the well map to a json file | ||
path_to_repo_data = pathlib.Path("../../../data/processed/").resolve() | ||
path_to_repo_data.mkdir(exist_ok=True, parents=True) | ||
with open(path_to_repo_data / "well_map.json", "w") as f: | ||
json.dump(well_map, f) | ||
# map the well to the well_map | ||
df["Well"] = df["Well"].map(well_map) | ||
df.head() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as comment above, is this json file meant to be the platemap file or what makes this different?
# check that there are | ||
# 5 fovs * 5 channels * 96 wells = 2400 images per plate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this be consistent for this project or is there a chance that this will change? Recommend making this more dynamic if possible.
# In[2]: | ||
|
||
|
||
# absolute path to the raw data directory only works on this machine |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recommend changing this comment since it looks like it doesn't apply here (these paths below are relative to the current directory).
# In[3]: | ||
|
||
|
||
dict_platemap = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This dictionary seems short compared to the other dictionary made for the JSON file in the last notebook. Are not all wells used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recommend code comment here to provide any clarity or context.
platemap_df["treatment"] = platemap_df["treatment"].str.replace( | ||
"CTL (complete medium only)", "Media" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why include this replacement here when you could change it in the dictionary above?
platemap_df[["treatment1", "treatment2"]] = platemap_df["treatment"].str.split( | ||
" \+ ", expand=True | ||
) | ||
# platemap_df[['treatment1', 'treatment2']] = platemap_df["treatment"].str.split("+",n=1, expand=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recommend removing commented-out code if not used/needed anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recommend removing this duplicate script since it is not the correct notebook name (missing 0.
prefix).
This PR preprocesses the data and files to be in the right directories.