Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pylint alerts corrections as part of an intervention experiment #207

Open
evidencebp opened this issue Nov 13, 2024 · 8 comments
Open

Pylint alerts corrections as part of an intervention experiment #207

evidencebp opened this issue Nov 13, 2024 · 8 comments

Comments

@evidencebp
Copy link
Contributor

I'd like to conduct a software engineering experiment regarding the benefit of Pylint alerts removal.
The experiment is described here.
In the experiments, Pylint is used with some specific alerts, files are selected for intervention and control.
After the interventions are done, one can wait and examine the results.

I'm asking for your approval for conducting an intervention in your repository.

See examples of interventions in stanford-oval/storm, gabfl/vault, and coreruleset/coreruleset.

You can see the planed interventions

May I do the interventions?

@blakesweeney
Copy link
Member

Thanks for being interested in improving our code. We aren't in priniciple against this experiment, but there is a pretty big issue. Our tests are not in good shape now and are likely broken. As such I'm not sure if you will be able to do this. I'm happy to do some mentoring to help you get the tests in shape, though. Let me know if you are still interested and we can see how we will go from there.

@evidencebp
Copy link
Contributor Author

Thank you for the suggestion @blakesweeney

The plan includes 86 intervention in 76 files, which is indeed quite large.
15 are readability alerts (e.g., too-long-line) with usually end up in adding a new line and are of low risk.
unnecessary-pass tend to be false alarms like in introducing a new exception subclass, leading to no modifications.

There are 13 sturcture alerts ( 'too-many-return-statements', 'too-many-branches', 'too-many-statements', 'too-many-nested-blocks').
These tend to invlove extracting methods (which many IDEs support), more work to modify and review, more risk yet more expected benefit.

What do you think about going by risk level?
I can start with the readbability alerts and continue as long as we think that the risk is not too high.

@blakesweeney
Copy link
Member

I think trying the small changes first is a fine idea. I will say we generally format with black which has different line length cutoffs the pylint (I think). But feel free to open some issues with the small cases and we can review them. Thanks!

@evidencebp
Copy link
Contributor Author

As we discussed, I created first a PR with the minor simple alerts:
line-too-long, unnecessary-pass, and broad-exception-caught.
I hope that the review will be easy.

I'd like to consult you regarding a few alerts, not in the PR.
Many files (e.g., rnacentral_pipeline\cli\ribocentre.py, rnacentral_pipeline\cli\search_export.py) have unnecessary-pass alerts.
They use the click.group decorator so it is obviously intended.
Out of curiosity, what is done there?

On the other hand, in the file rnacentral_pipeline\databases\generic\v1.py the function gene_info is just pass.
I never see it called and there are only references to a variable in this name in the project.
Can it be removed?

As for broad-exception-caught, when one catches Exception it might hide unexpected exceptions (e.g., due to future changes).
Therefore, it is recommended to catch more specific exceptions.
I list below a few cases in which I could not figure out which exception might be raised.
@blakesweeney , can you consult me?

rnacentral_pipeline\databases\europepmc\stream.py
The function fallback calls fetch.lookup (in line 41). 

rnacentral_pipeline\databases\ensembl\vertebrates\parser.py
 The function as_entry calls embl.sequence in the try section (line 54). 

bin\litscan-retracted-articles.py
The main function catches in line 80(Exception, psycopg2.DatabaseError)

rnacentral_pipeline\databases\crw\helpers.py
The function as_entry constructs an Entry object (line 79). 

rnacentral_pipeline\databases\rfam\parser.py
Only helpers.sequence is called there. It seems that the possible exception is AttributeError in sequence_id.
Is it so?
Can Exception be replaced with it?

@blakesweeney
Copy link
Member

blakesweeney commented Nov 19, 2024 via email

@evidencebp
Copy link
Contributor Author

I modified rnacentral_pipeline\databases\europepmc\stream.py
Please note that I could not find where TooManyPublications is defined (it isreaised in fetch).

I also modifed bin\litscan-retracted-articles.py

I did not understand the guidance regarding the next too. Can you clarify?

rnacentral_pipeline\databases\crw\helpers.py
The function as_entry constructs an Entry object (line 79).

rnacentral_pipeline\databases\ensembl\vertebrates\parser.py
The function as_entry calls embl.sequence in the try section (line 54).

@blakesweeney
Copy link
Member

blakesweeney commented Nov 21, 2024 via email

@evidencebp
Copy link
Contributor Author

Sure, so I will not modify them.
Thank you so much for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants