-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Master list consolidation #533
Comments
The order below is my ad hoc ranking but AWK may not correspond to the likelihood or time for fixes. I'm kicking off with just outlines that can be expanded if clarification details are needed.
Addendum 15/08: While the arguments above still stand (but are open to discussion as ever) I just re-read our esteemed S1 paper https://www.ncbi.nlm.nih.gov/pubmed/27800551. This revealed that not only did the drafters of the supp dat sections do a good job of partitioning the inter-assay SAR but they also included some honest notes on variance issues. Ipso facto, while I don't remember details of the referees comments, we thus did make it in, despite a certain amount of "ducking and weaving" through the heterogenous assays. Notwithstanding , being a couple of years further on we know now it would be more rigorous to re-run all the S4 SAR in one robust, standardised assay (NOBA our large PubChem BioAssay submission would be based on this normalised data). This also likely to concomitantly clamp down the intra-assay variance. Note that this in no way precludes contributors from the different assays so admirably supporting the S4 work being on the author list, even if only the re-run data ends up being included.
|
It has been brought to my attention (in a friendly way of course) that in order to get some engagement on the big themes above, I should split the objectives down to bite-size chunks. While I don't think the esteemed collaborators around here really need things to be Mickey-Moused we can try some chunking and see if that gets any traction.
|
Sure. Breaking this down will be the easiest way to go. I'll start where you suggest and get the OSM numbers in the sheet. |
Agreed. To contain the task I'd recomment focussing on the Series 4 compounds for now. Also @cdsouthan just re the distinction between "final" compounds and synthetic intermediates. We have this already (fully - I think - for Series 4, less so for others) in that all biologically evaluated compounds have MMV numbers. Can we just use that as the filter? |
OK, everything in the sheet now has its OSM number. There were a couple of Sydney compounds, around 20 Edinburgh compounds and around 100 inherited compounds. The inherited compounds were designated 'X', as we're not sure where they were synthesised. |
Good, While I woke up in the night wondering if adding a synonym was such a good idea after all (seeing as a general principle we want as few as possible) - I hope it helps in the end that "we" have our full set of primary identifier ducks lined up. OK, so mystery compounds seem somewhat paradoxical in open drug discovery (but lets hope the referees overlook that... ). Moving swiftly on then; Re 2 above; please do a referential integrity check that should include but is not restricted to a) absence of gaps for all the molecular specifications b) no corruptions c) no duplicates in any columns d) check the sheet behaves itself for common operations e.g. text <> CSV < > Excel <> Libre Office (or whatever) and that column sorting and setting tables also work OK. Also archive a copy-just in case.... As ever @david1597 try to find an OSM friend to do this tedious but crucial x-checking with you ( @mattodd will buy them a beer and maybe a pizza also for you) n.b. 01 Adding fresh synonyms raises the back-propagation issue. At some time it wbgood to arrange that a named MMV person has a copy of our optimised sheet so they are on the record as being in possession of these new n2s mappings for their compounds n.b. 02 I take @mattodd s point if the MMV filter gives a clean cut from isolated intermediates - but why not run them in the assay anyway? We can then simply push the entire S4 structure set to PubChem at some point |
The integrity checks will likely come as we write up the experimental for the series 4 paper - @edwintse and myself should hopefully be getting through these in the near future. |
Updating, checking and optimising the Master List (ML)
https://docs.google.com/spreadsheets/d/1Rvy6OiM291d1GN_cyT6eSw_C3lSuJ1jaR7AJa8hgGsc/edit#gid=510297618
This is a general task but crucial for the upcoming Series 4 paper in particular. Ideally this ML should be migrated/transformed into a small open database but that's a task for the future. I hope the suggestions below do not come across as pedantically over-prescriptive (a.k.a. council of perfection) but they are based on the many quirks/foibles/gotchas I have had fun (mostly, but some exasperation also) ferreting, divining and grappling with over the years in both databases and papers (see https://cdsouthan.blogspot.se/). Many of us will need to pitch in here but I had assigned it to our esteemed first-author since it is crucial not only for the nascent paper but also any subsequent ones.
JFTR I do not want to take the responsibility for actually editing the sheet. This is better done by those directly engaged with making the structures and generating the data (even as inputs from collaborators). I also suggest team members do such editing in pairs to cross-check inputs and changes (cup of coffee job?) and that senior authors keep abreast of how things are going on the ML front.
Aspects of the ML have come up in previous posts concerning Google indexing #511 and direct visualisation #515
The text was updated successfully, but these errors were encountered: