Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.0.0 – Revive and bugfixes #122

Merged
merged 47 commits into from
Dec 19, 2023
Merged

v1.0.0 – Revive and bugfixes #122

merged 47 commits into from
Dec 19, 2023

Conversation

piconti
Copy link
Member

@piconti piconti commented Dec 14, 2023

This large PR covers various updates aiming to revive this package and perform some smaller bugfixes.
Thes changes mainly aim at preparng the text-importer packages for the modifications necessary as part of Impresso II, which contain both the integration of new importers, and the corrections of various issues identified in the canonical data.

The modifications include:

  • General dependencies update: updating the versions of the packages listed in requirements.txt, and loosening version restrictions (using >= instead of ==) to reduce the number of inconsistencies possible.
  • Packaging using pyproject.toml instead of setup.py.
  • Replace the use of some deprecated libraries.
  • Performing some small bugfixes and optimizations
    • Leveraging the apply_select_func() function in generic_importer.py which was never called.
    • Preventing some I/O errors that would randomly appear, by replacing codecs.open() by the native python io.open() function when possible, and setting a retrying approach when the error came form os.listdir().
    • Preventing avoidable error raisings in mets.parse_mets_amdsec() when an element is not found in the XML file.
    • Removing avoidable calls to issue.xml or page.xml when parsing, especially in for loops, as this could create a computing overhead. Note: this appraoch could be extended to more cases as part of a separate issue.
    • Explicitely freeing some dask objects after use to reduce the memory use. This could also be explored further when scaling.
  • Update the tests, add a test for BNF-EN. Note: tests will be further imporved in upcoming PRs.
  • Update the docstring to match the changes, swithcing to Google style comments, adding typing when missing, and limiting line length to 80 characters.
  • Update the ReadtheDocs documentation in consequence, adding the .readthedocs.yaml file.
    • Note there might still be modifications necessary to the docs folder that will be known once this PR is open.

Closes #121.
As a consequence of the various changes, the version was bumped to 1.0.0.

Pauline Conti and others added 30 commits September 27, 2023 14:26
@piconti piconti self-assigned this Dec 14, 2023
@piconti piconti linked an issue Dec 14, 2023 that may be closed by this pull request
5 tasks
@piconti piconti merged commit 3a5134c into master Dec 19, 2023
1 check passed
@piconti piconti deleted the revive_and_bugfixes branch December 19, 2023 13:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update Text-importer dependencies and documenation, and fix small associated bugs
1 participant