Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed version parsing #24

Merged
merged 3 commits into from
Jan 23, 2021
Merged

Fixed version parsing #24

merged 3 commits into from
Jan 23, 2021

Conversation

donnyyy777
Copy link
Contributor

Accepts version numbers that do not necessarily contain 3 numbers (x.x.x) - can read versions that contain 1 or 2 numbers (x) or (x.x)

Accepts version numbers that do not necessarily contain 3 numbers (x.x.x) - can read versions that contain 1 or 2 numbers (x) or (x.x)
@donnyyy777
Copy link
Contributor Author

Hi,

I was trying to use the MzTab parser on an MzTab file, but found out that it threw errors because of a call to search for a version number in the form of x.x.x when the file I was working with had a version number in the form x.x (2 numbers instead of 3). I edited that part such that it wouldn't throw errors if the version number didn't necessarily contain 3 numbers. I made sure to maintain the check that the variant was either "M" or "P".

Copy link
Contributor

@mobiusklein mobiusklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a step ensuring that self.num_version is always a 3-tuple? This way we can always index it in the event someone wants to reason about features.

The regex for version parsing does have nice properties still as it ensures that the first components are still numerals. You could get the same effect with a slightly more complex regex, but this isn't strictly necessary.

version_parsed, variant = re.search(r"(?P<schema_version>\d+.\d+.\d+)(?:-(?P<schema_variant>[MP]))?", self.version).groups()
if variant is None:
version_parsed, _, variant = str(self.version).partition("-")
if variant is None or (variant != "M" and variant != "P"):
variant = "P"
self.num_version = [int(v) for v in version_parsed.split(".")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.num_version = [int(v) for v in version_parsed.split(".")]
self.num_version = [int(v) for v in version_parsed.split(".")]
# Ensure self.num_version is 3-tuple
while len(self.num_version) < 3:
self.num_version.append(0)

@@ -744,8 +744,8 @@ def _parse(self):
self.small_molecule_evidence_table.add(tokens[1:])

def _determine_schema_version(self):
version_parsed, variant = re.search(r"(?P<schema_version>\d+.\d+.\d+)(?:-(?P<schema_variant>[MP]))?", self.version).groups()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can keep the regex for defending against non-integer matches using non-capturing optional groups:
(?P<schema_version>\d+(?:.\d+(?:.\d+)?)?)(?:-(?P<schema_variant>[MP]))?

Reverted back to a tweaked regex search for stricter parsing and added a check to ensure self.num_version is always a 3 tuple regardless of original version number.
Copy link
Contributor Author

@donnyyy777 donnyyy777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the changes you suggested. I agree that using regex is better in this case to parse the version number since it offers a stricter search. I also added the check to make sure that self.num_version is a 3 tuple.

@mobiusklein
Copy link
Contributor

Looks good to me. I don't see any more issues. Was the file you were trying to parse an mzTab 1.0 file?

Copy link
Contributor Author

@donnyyy777 donnyyy777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, mzTab file with version 1.0, that's exactly right

pyteomics/mztab.py Outdated Show resolved Hide resolved
@levitsky levitsky merged commit d6ce797 into levitsky:master Jan 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants