You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello :-) and first of all sorry for my late reply, I was quite busy in the last days!
Regarding your issue: I think that I didn't use the direct query for mass simply because I wasn't aware of it xD (I also remember that I found UniProt's API documentation quite confusing, if you e.g. know a list of queriable parameters with this API you can gladly let me know :-). You're right that the direct query for mass is simpler, and I also just checked your pull request and as far as I can see with a test case I didn't find significant differences other than the rounding of protein masses compared to the current method, so that I just merged it into the master branch.
I agree that the UniProt API is a bit hard to learn. I had to spend some days myself to conprehend how it works. Anyways, the list of queriable parameters is given at this site: https://www.uniprot.org/help/uniprotkb_column_names.
I am a bit puzzled by the method to determine protein mass from UniProt (https://github.com/ARB-Lab/autopacmen/blob/69a158003d5bab3f597ec5da727515d250f35a43/autopacmen/submodules/get_protein_mass_mapping.py#L133). First the UniProt is queried for the amino acid sequence and then the amino acid sequence is analyzed for molecular mass. However, UniProt can be queried directly for the mass such as (https://www.uniprot.org/uniprot/?query=HXKA_YEAST%20OR%20G6PI_YEAST&format=tab&columns=id,mass). Why is not this simpler approach used in AutoPacmen? Beware though, UniProt outputs the mass with comma as a thousand separator, so you have to write something like
float(mass.replace(',','')
to parse the result.The text was updated successfully, but these errors were encountered: