-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed version parsing #24
Conversation
Accepts version numbers that do not necessarily contain 3 numbers (x.x.x) - can read versions that contain 1 or 2 numbers (x) or (x.x)
Hi, I was trying to use the MzTab parser on an MzTab file, but found out that it threw errors because of a call to search for a version number in the form of x.x.x when the file I was working with had a version number in the form x.x (2 numbers instead of 3). I edited that part such that it wouldn't throw errors if the version number didn't necessarily contain 3 numbers. I made sure to maintain the check that the variant was either "M" or "P". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a step ensuring that self.num_version
is always a 3-tuple? This way we can always index it in the event someone wants to reason about features.
The regex for version parsing does have nice properties still as it ensures that the first components are still numerals. You could get the same effect with a slightly more complex regex, but this isn't strictly necessary.
version_parsed, variant = re.search(r"(?P<schema_version>\d+.\d+.\d+)(?:-(?P<schema_variant>[MP]))?", self.version).groups() | ||
if variant is None: | ||
version_parsed, _, variant = str(self.version).partition("-") | ||
if variant is None or (variant != "M" and variant != "P"): | ||
variant = "P" | ||
self.num_version = [int(v) for v in version_parsed.split(".")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.num_version = [int(v) for v in version_parsed.split(".")] | |
self.num_version = [int(v) for v in version_parsed.split(".")] | |
# Ensure self.num_version is 3-tuple | |
while len(self.num_version) < 3: | |
self.num_version.append(0) |
@@ -744,8 +744,8 @@ def _parse(self): | |||
self.small_molecule_evidence_table.add(tokens[1:]) | |||
|
|||
def _determine_schema_version(self): | |||
version_parsed, variant = re.search(r"(?P<schema_version>\d+.\d+.\d+)(?:-(?P<schema_variant>[MP]))?", self.version).groups() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can keep the regex for defending against non-integer matches using non-capturing optional groups:
(?P<schema_version>\d+(?:.\d+(?:.\d+)?)?)(?:-(?P<schema_variant>[MP]))?
Reverted back to a tweaked regex search for stricter parsing and added a check to ensure self.num_version is always a 3 tuple regardless of original version number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made the changes you suggested. I agree that using regex is better in this case to parse the version number since it offers a stricter search. I also added the check to make sure that self.num_version is a 3 tuple.
Looks good to me. I don't see any more issues. Was the file you were trying to parse an mzTab 1.0 file? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, mzTab file with version 1.0, that's exactly right
Accepts version numbers that do not necessarily contain 3 numbers (x.x.x) - can read versions that contain 1 or 2 numbers (x) or (x.x)