Fixed version parsing #24

donnyyy777 · 2021-01-22T04:35:31Z

Accepts version numbers that do not necessarily contain 3 numbers (x.x.x) - can read versions that contain 1 or 2 numbers (x) or (x.x)

donnyyy777 · 2021-01-22T04:45:57Z

Hi,

I was trying to use the MzTab parser on an MzTab file, but found out that it threw errors because of a call to search for a version number in the form of x.x.x when the file I was working with had a version number in the form x.x (2 numbers instead of 3). I edited that part such that it wouldn't throw errors if the version number didn't necessarily contain 3 numbers. I made sure to maintain the check that the variant was either "M" or "P".

mobiusklein

Could you add a step ensuring that self.num_version is always a 3-tuple? This way we can always index it in the event someone wants to reason about features.

The regex for version parsing does have nice properties still as it ensures that the first components are still numerals. You could get the same effect with a slightly more complex regex, but this isn't strictly necessary.

mobiusklein · 2021-01-22T14:31:28Z

pyteomics/mztab.py

-        version_parsed, variant = re.search(r"(?P<schema_version>\d+.\d+.\d+)(?:-(?P<schema_variant>[MP]))?", self.version).groups()
-        if variant is None:
+        version_parsed, _, variant = str(self.version).partition("-")
+        if variant is None or (variant != "M" and variant != "P"):
            variant = "P"
        self.num_version = [int(v) for v in version_parsed.split(".")]


Suggested change

self.num_version = [int(v) for v in version_parsed.split(".")]

self.num_version = [int(v) for v in version_parsed.split(".")]

# Ensure self.num_version is 3-tuple

while len(self.num_version) < 3:

self.num_version.append(0)

mobiusklein · 2021-01-22T14:34:40Z

pyteomics/mztab.py

@@ -744,8 +744,8 @@ def _parse(self):
                self.small_molecule_evidence_table.add(tokens[1:])

    def _determine_schema_version(self):
-        version_parsed, variant = re.search(r"(?P<schema_version>\d+.\d+.\d+)(?:-(?P<schema_variant>[MP]))?", self.version).groups()


We can keep the regex for defending against non-integer matches using non-capturing optional groups:
(?P<schema_version>\d+(?:.\d+(?:.\d+)?)?)(?:-(?P<schema_variant>[MP]))?

Reverted back to a tweaked regex search for stricter parsing and added a check to ensure self.num_version is always a 3 tuple regardless of original version number.

donnyyy777

I made the changes you suggested. I agree that using regex is better in this case to parse the version number since it offers a stricter search. I also added the check to make sure that self.num_version is a 3 tuple.

mobiusklein · 2021-01-22T19:58:03Z

Looks good to me. I don't see any more issues. Was the file you were trying to parse an mzTab 1.0 file?

donnyyy777

Yes, mzTab file with version 1.0, that's exactly right

pyteomics/mztab.py

Fixed version parsing

8d3ac6c

Accepts version numbers that do not necessarily contain 3 numbers (x.x.x) - can read versions that contain 1 or 2 numbers (x) or (x.x)

mobiusklein suggested changes Jan 22, 2021

View reviewed changes

Updated version parsing

695e0e3

Reverted back to a tweaked regex search for stricter parsing and added a check to ensure self.num_version is always a 3 tuple regardless of original version number.

donnyyy777 commented Jan 22, 2021

View reviewed changes

levitsky reviewed Jan 23, 2021

View reviewed changes

pyteomics/mztab.py Outdated Show resolved Hide resolved

Fix mztab version string pattern

977b253

levitsky merged commit d6ce797 into levitsky:master Jan 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed version parsing #24

Fixed version parsing #24

donnyyy777 commented Jan 22, 2021

donnyyy777 commented Jan 22, 2021

mobiusklein left a comment

mobiusklein Jan 22, 2021

mobiusklein Jan 22, 2021

donnyyy777 left a comment

mobiusklein commented Jan 22, 2021

donnyyy777 left a comment

-        self.num_version = [int(v) for v in version_parsed.split(".")]
+        self.num_version = [int(v) for v in version_parsed.split(".")]
+        # Ensure self.num_version is 3-tuple
+        while len(self.num_version) < 3:
+              self.num_version.append(0)

Fixed version parsing #24

Fixed version parsing #24

Conversation

donnyyy777 commented Jan 22, 2021

donnyyy777 commented Jan 22, 2021

mobiusklein left a comment

Choose a reason for hiding this comment

mobiusklein Jan 22, 2021

Choose a reason for hiding this comment

mobiusklein Jan 22, 2021

Choose a reason for hiding this comment

donnyyy777 left a comment

Choose a reason for hiding this comment

mobiusklein commented Jan 22, 2021

donnyyy777 left a comment

Choose a reason for hiding this comment