Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various issues while processing PeptideShaker mzIdentML results #165

Closed
ofleitas opened this issue Aug 1, 2024 · 8 comments · Fixed by #182
Closed

Various issues while processing PeptideShaker mzIdentML results #165

ofleitas opened this issue Aug 1, 2024 · 8 comments · Fixed by #182
Assignees
Labels

Comments

@ofleitas
Copy link

ofleitas commented Aug 1, 2024

Hello

I am trying to run ms2rescore but get the following error :

Reading PSMs from file...
Reading PSMs from PSM file (1/1): `C:/Users/ofm83/OneDrive/Documents/Megaphobema_perterklaasi/output_PeptideShaker/Megaphobema_perterklaasi.mzid`...
undefined entity: line 35, column 2
Traceback (most recent call last):
  File "ms2rescore\gui\function2ctk.py", line 301, in run
    self.fn(*self.fn_args, **self.fn_kwargs)
  File "ms2rescore\gui\app.py", line 637, in function
    rescore(configuration=config)
  File "ms2rescore\core.py", line 40, in rescore
    psm_list = parse_psms(config, psm_list)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "ms2rescore\parse_psms.py", line 28, in parse_psms
    psm_list = _read_psms(config, psm_list)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "ms2rescore\parse_psms.py", line 90, in _read_psms
    id_file_psm_list = psm_utils.io.read_file(
                       ^^^^^^^^^^^^^^^^^^^^^^^
  File "psm_utils\io\__init__.py", line 158, in read_file
    reader = reader_cls(filename, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "psm_utils\io\mzid.py", line 143, in __init__
    self._source = self._infer_source()
                   ^^^^^^^^^^^^^^^^^^^^
  File "psm_utils\io\mzid.py", line 177, in _infer_source
    mzid_xml = ET.parse(self.filename)
               ^^^^^^^^^^^^^^^^^^^^^^^
  File "xml\etree\ElementTree.py", line 1218, in parse
  File "xml\etree\ElementTree.py", line 580, in parse
xml.etree.ElementTree.ParseError: undefined entity: line 35, column 2

What can I do?

@RalfG
Copy link
Member

RalfG commented Aug 2, 2024

Hi @ofleitas,

Thanks for reaching out!

It seems like there is something in your mzIdentML file that cannot be read. It might be corrupt in some way. The program crashes while reading line 35 of the file. To investigate, you can open the mzIdentML file in any text reader (Notepad, Notepad++, VS Code), as long as they are not too large.

If you can and want, you can also send us the file. I'd be happy to take a look.

Best,
Ralf

@ofleitas
Copy link
Author

ofleitas commented Aug 2, 2024

Hello RalfG

I solved the problem associated with line 35, it seems it was because of a special character. But now I am getting this error :

Adding DeepLC-derived features to PSMs.
Running DeepLC for PSMs from run (1/1): 20220322_ID_6552...
Multiple modifications per site not supported in Peptide Record format.
Traceback (most recent call last):
File "ms2rescore\gui\function2ctk.py", line 301, in run
self.fn(*self.fn_args, **self.fn_kwargs)
File "ms2rescore\gui\app.py", line 637, in function
rescore(configuration=config)
File "ms2rescore\core.py", line 76, in rescore
fgen.add_features(psm_list)
File "ms2rescore\feature_generators\deeplc.py", line 163, in add_features
seq_df=self._psm_list_to_deeplc_peprec(psm_list_calibration)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "ms2rescore\feature_generators\deeplc.py", line 210, in _psm_list_to_deeplc_peprec
peprec = peptide_record.to_dataframe(psm_list)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "psm_utils\io\peptide_record.py", line 505, in to_dataframe
return pd.DataFrame([PeptideRecordWriter._psm_to_entry(psm) for psm in psm_list])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "psm_utils\io\peptide_record.py", line 505, in
return pd.DataFrame([PeptideRecordWriter._psm_to_entry(psm) for psm in psm_list])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "psm_utils\io\peptide_record.py", line 285, in _psm_to_entry
sequence, modifications, charge = proforma_to_peprec(psm.peptidoform)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "psm_utils\io\peptide_record.py", line 443, in proforma_to_peprec
ms2pip_mods.append(_mod_to_ms2pip(mod, i + 1))
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "psm_utils\io\peptide_record.py", line 433, in _mod_to_ms2pip
raise InvalidPeprecModificationError(
psm_utils.io.peptide_record.InvalidPeprecModificationError: Multiple modifications per site not supported in Peptide Record format.

@RalfG
Copy link
Member

RalfG commented Aug 3, 2024

Hi @ofleitas,

Regarding the first issue: An encoding problem, most likely. For future reference, a possible fix could be to open the mzIdentML file in an editor such as Windows Notepad and saving it again with "UTF-8" encoding specified.

For the Multiple modifications per site error: I believe this issue was fixed in one of the latest releases. Could you check if the problem persists with the latest release?

Best,
Ralf

@RalfG RalfG changed the title undefined entity: line 35, column 2 undefined entity while reading mzIdentML file Aug 3, 2024
@RalfG RalfG self-assigned this Aug 3, 2024
@ofleitas
Copy link
Author

ofleitas commented Aug 3, 2024

I installed the last release and it was solved the multiple modifications per site error. However, now I am getting this error:

Error occurred:
index -3 is out of bounds for axis 0 with size 1

@RalfG RalfG added question and removed help wanted labels Aug 6, 2024
@RalfG
Copy link
Member

RalfG commented Aug 7, 2024

Glad the second issue was also solved by updating.

Can you provide some more information on the error? Could you paste the full log? Thanks!

@ofleitas
Copy link
Author

ofleitas commented Aug 7, 2024 via email

@RalfG
Copy link
Member

RalfG commented Aug 14, 2024

Hi @ofleitas,

Thanks for sharing the log. It seems that the issue occurs when calculating PEP values with qvality (through Triqler, through Mokapot). Although, I have not seen this problem before. If you are at liberty to share the input files that lead to this error, that would be very helpful. If I'm not mistaken, a *ms2rescore.psms.tsv file was already written before the error occurred? This file should suffice to help me understand the problem.

Thanks!

@RalfG
Copy link
Member

RalfG commented Aug 21, 2024

The IndexError occurred due to input scores (before rescoring) that were all either 0 or 100 (PeptideShaker scores on this specific sample), from which PEPs cannot be calculated. The issue is addressed in #182 by catching the error and logging a descriptive warning. This fix will be part of the v3.1.2 release.

@RalfG RalfG closed this as completed Aug 21, 2024
@RalfG RalfG changed the title undefined entity while reading mzIdentML file Various issues while processing PeptideShaker mzIdentML results Aug 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants