Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adopt python bootstrap conventions #191

Merged
merged 19 commits into from
Aug 18, 2023
Merged

Adopt python bootstrap conventions #191

merged 19 commits into from
Aug 18, 2023

Conversation

benoit74
Copy link
Collaborator

@benoit74 benoit74 commented Aug 18, 2023

Fix #190

Changes:

  • use hatch instead of setuptools
  • fix requirements versions
  • add invoke tasks
  • many small adjustments to align with (new) Python / ruff conventions
  • few hints (# noqa or # type:ignore) to disable some issues (ones needing a investigation / action are all tracked in a specific issue)
  • support only Python 3.11 (especially dropped support for Python 2)

This change is Reviewable

@benoit74 benoit74 self-assigned this Aug 18, 2023
@benoit74 benoit74 changed the title Adopt python bootstrap Adopt python bootstrap conventions Aug 18, 2023
@benoit74 benoit74 marked this pull request as draft August 18, 2023 08:08
Copy link
Collaborator Author

@benoit74 benoit74 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+reviewer:@rgaudin

Reviewable status: 0 of 86 files reviewed, 18 unresolved discussions (waiting on @rgaudin)


MANIFEST.in line 0 at r1 (raw file):
What is this file used for / where is it used?


src/gutenberg2zim/checkdeps.py line 1 at r1 (raw file):

#!/usr/bin/env python

Is it ok to remove these headers or do we need to maintain them? Looks like they are unused by the team now, so from my PoV it is useless to maintain them


src/gutenberg2zim/checkdeps.py line 33 at r1 (raw file):

    all_good = True
    has_zimwriter = True
    for binary, msg in all_bins.items():

renamed to not shadow built-in


src/gutenberg2zim/database.py line 40 at r1 (raw file):

    class Meta:
        database = db
        fixtures = (

tuple is not mutable while list is, and it is recommended to not have mutable class attributes (and we do not modify them anyway)


src/gutenberg2zim/download.py line 39 at r1 (raw file):

def handle_zipped_epub(zippath, book, dst_dir: pathlib.Path):

Just to help pyright understand what is going on


src/gutenberg2zim/download.py line 109 at r1 (raw file):

def download_book(

Type hints for pyright, again


src/gutenberg2zim/download.py line 205 at r1 (raw file):

        if bfs.count() > 1:
            try:
                bf = bfs.filter(bfs.images).get()

This one was a real bug !


src/gutenberg2zim/download.py line 255 at r1 (raw file):

                    continue
                # save etag
                book.html_etag = etag  # type: ignore

All these one are linked to peewee issue with pyright, maybe/probably solved in a more recent version


src/gutenberg2zim/download.py line 361 at r1 (raw file):

def download_all_books(
    download_cache: str,

Type hints to help pyright


src/gutenberg2zim/entrypoint.py line 16 at r1 (raw file):

from gutenberg2zim.zim import build_zimfile

help_info = (

do not shadow


src/gutenberg2zim/entrypoint.py line 73 at r1 (raw file):

def main():
    arguments = docopt(help_info, version=VERSION)

no need to pass as parameter


src/gutenberg2zim/entrypoint.py line 76 at r1 (raw file):

    # optimizer version to use
    optimizer_version = {"html": "v1", "epub": "v1", "cover": "v1"}

all lower case


src/gutenberg2zim/export.py line 190 at r1 (raw file):

def export_all_books(
    project_id,
    download_cache=None,

useless default values, always passed + misleading, because in many case we do not test that parameters might be None, and even assume that it is not None


src/gutenberg2zim/export.py line 240 at r1 (raw file):

    export_to_json_helpers(
        books=books,
        languages=languages,

Unused params in function, removed


src/gutenberg2zim/export.py line 443 at r1 (raw file):

                [
                    1
                    for e in body.children  # type: ignore

Many BS type issues, all ignored


src/gutenberg2zim/export.py line 768 at r1 (raw file):

    def handle_companion_file(
        fname,
        book: Book,

Reordering to express that book has no default (it must be passed) and cannot be None


src/gutenberg2zim/shared.py line 77 at r1 (raw file):

                should_compress=should_compress,
                delete_fpath=delete_fpath,
                callback=callback,

Unused + typing issue, so removed for now


src/gutenberg2zim/utils.py line 78 at r1 (raw file):

def get_project_id(languages=[], formats=[], only_books=[]):

Mutable defaults are not recommended + useless here

@benoit74 benoit74 marked this pull request as ready for review August 18, 2023 08:37
Copy link
Collaborator Author

@benoit74 benoit74 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oups, I forgot to really fix requirements version. Will do it after your review, sorry.

Reviewable status: 0 of 86 files reviewed, 18 unresolved discussions (waiting on @rgaudin)

Copy link
Collaborator Author

@benoit74 benoit74 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I also forgot to fix Github workflows. Will do it later as well.

Reviewable status: 0 of 86 files reviewed, 18 unresolved discussions (waiting on @rgaudin)

Copy link
Member

@rgaudin rgaudin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems you've not used git to move the module to src/. We're loosing all per-file history. Please revert, use git mv and reapply your changes

Reviewed 84 of 86 files at r1, 2 of 2 files at r2, all commit messages.
Reviewable status: all files reviewed, 31 unresolved discussions (waiting on @benoit74)


MANIFEST.in line at r1 (raw file):

Previously, benoit74 wrote…

What is this file used for / where is it used?

This is part of the setuptools/distutils distribution. Rendered obsolete by our new bootstrap


pyproject.toml line 22 at r2 (raw file):

    "Jinja2==3.1.2",
    "peewee==3.15.4",
    "path.py==12.5.0",

I saw you opened a ticket about pathlib but did not mention this. Should probably be mentioned there


pyproject.toml line 28 at r2 (raw file):

    "chardet==5.1.0",
    "apsw==3.40.0.0",
    "kiwixstorage>=0.5,<1.0",

Why the range?


pyproject.toml line 29 at r2 (raw file):

    "apsw==3.40.0.0",
    "kiwixstorage>=0.5,<1.0",
    "requests>=2.23,<3.0",

Same question


pyproject.toml line 31 at r2 (raw file):

    "requests>=2.23,<3.0",
    "pif==0.8.2",
    "zimscraperlib>=2.1,<2.2",

And again


pyproject.toml line 32 at r2 (raw file):

    "pif==0.8.2",
    "zimscraperlib>=2.1,<2.2",
    "schedule>=1.1.0,<1.2",

😉


pyproject.toml line 46 at r2 (raw file):

]
check = [
  "pyright==1.1.318",

Pretty sure we're at 323 ATM


pyproject.toml line 55 at r2 (raw file):

    "pre-commit==3.3.3",
    "debugpy==1.6.7",
# ipdb==0.13.11

Leave them in l it's dev afterall. I tend to install both in most projects so I'd support its addition to bootstrap


pyproject.toml line 93 at r2 (raw file):

report-cov = "inv report-cov"
coverage = "inv coverage --args '{args}'"

No html script? Is this up to date?


src/gutenberg2zim/__main__.py line 7 at r2 (raw file):

def main():
    # allows running it from source using python gutenberg2zim
    sys.path = [str(pathlib.Path(__file__).parent.parent.resolve()), *sys.path]

We should drop this hack and run it as the other projects


src/gutenberg2zim/checkdeps.py line 1 at r1 (raw file):

Previously, benoit74 wrote…

Is it ok to remove these headers or do we need to maintain them? Looks like they are unused by the team now, so from my PoV it is useless to maintain them

I agree.


src/gutenberg2zim/database.py line 192 at r2 (raw file):

    downloaded_from = CharField(max_length=300, null=True)

    def __unicode__(self):

Since you're dropping py2 support, you should rename those __unicode__ to __str__


src/gutenberg2zim/download.py line 205 at r1 (raw file):

Previously, benoit74 wrote…

This one was a real bug !

👍


src/gutenberg2zim/download.py line 81 at r2 (raw file):

        src = os.path.join(tmpd, fname)
        if os.path.exists(src):
            fname_path = Path(fname).basename()

fname_path doesn't really makes sense. I understand it's used not to designate a path but the object type (from path.py)


src/gutenberg2zim/entrypoint.py line 135 at r2 (raw file):

    try:
        books = list(books.split(","))

split() already returns a list


src/gutenberg2zim/urls.py line 233 at r2 (raw file):

def setup_urls(force):
    file_with_url = os.path.join("tmp", f"file_on_{UrlBuilder.SERVER_NAME}")

Shouldn't this use the TMP constant?

Copy link
Member

@rgaudin rgaudin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 9 of 9 files at r3, all commit messages.
Reviewable status: all files reviewed, 32 unresolved discussions (waiting on @benoit74)


.github/workflows/Publish.yaml line 29 at r3 (raw file):

      - name: Upload to PyPI
        uses: pypa/gh-action-pypi-publish@release/v1.8

The comments on bootstrap are important IMO. Especially since this requires manual intervention to enable Trusted Publishing for all projects (as we switch them to the bottstrap).
I just added it

@benoit74 benoit74 requested a review from rgaudin August 18, 2023 11:29
Copy link
Collaborator Author

@benoit74 benoit74 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ready for a new review

Reviewable status: all files reviewed, 19 unresolved discussions (waiting on @rgaudin)


pyproject.toml line 22 at r2 (raw file):

Previously, rgaudin (rgaudin) wrote…

I saw you opened a ticket about pathlib but did not mention this. Should probably be mentioned there

Done.


pyproject.toml line 28 at r2 (raw file):

Previously, rgaudin (rgaudin) wrote…

Why the range?

Done.


pyproject.toml line 29 at r2 (raw file):

Previously, rgaudin (rgaudin) wrote…

Same question

Done.


pyproject.toml line 31 at r2 (raw file):

Previously, rgaudin (rgaudin) wrote…

And again

Done.


pyproject.toml line 32 at r2 (raw file):

Previously, rgaudin (rgaudin) wrote…

😉

Done.


pyproject.toml line 46 at r2 (raw file):

Previously, rgaudin (rgaudin) wrote…

Pretty sure we're at 323 ATM

Done.


pyproject.toml line 55 at r2 (raw file):

Previously, rgaudin (rgaudin) wrote…

Leave them in l it's dev afterall. I tend to install both in most projects so I'd support its addition to bootstrap

Done


pyproject.toml line 93 at r2 (raw file):

Previously, rgaudin (rgaudin) wrote…

No html script? Is this up to date?

Done
It has been added in r3 ;o)


.github/workflows/Publish.yaml line 29 at r3 (raw file):

Previously, rgaudin (rgaudin) wrote…

The comments on bootstrap are important IMO. Especially since this requires manual intervention to enable Trusted Publishing for all projects (as we switch them to the bottstrap).
I just added it

What do you mean? I should keep the comments from the bootstrap here? You explicitly asked me to remove them since we were using Trusted Publishing on another project. And yes, I know that it meant that you had to perform a manual intervention. I don't mind to add the comment back.


src/gutenberg2zim/__main__.py line 7 at r2 (raw file):

Previously, rgaudin (rgaudin) wrote…

We should drop this hack and run it as the other projects

Done
This hack seems useless indeed


src/gutenberg2zim/database.py line 192 at r2 (raw file):

Previously, rgaudin (rgaudin) wrote…

Since you're dropping py2 support, you should rename those __unicode__ to __str__

Done.


src/gutenberg2zim/download.py line 81 at r2 (raw file):

Previously, rgaudin (rgaudin) wrote…

fname_path doesn't really makes sense. I understand it's used not to designate a path but the object type (from path.py)

Done.


src/gutenberg2zim/urls.py line 233 at r2 (raw file):

Previously, rgaudin (rgaudin) wrote…

Shouldn't this use the TMP constant?

Done.

Copy link
Collaborator Author

@benoit74 benoit74 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 85 of 92 files reviewed, 19 unresolved discussions (waiting on @rgaudin)


src/gutenberg2zim/entrypoint.py line 135 at r2 (raw file):

Previously, rgaudin (rgaudin) wrote…

split() already returns a list

Done.

@codecov
Copy link

codecov bot commented Aug 18, 2023

Codecov Report

❗ No coverage uploaded for pull request base (main@1ea3cc1). Click here to learn what that means.
Patch has no changes to coverable lines.

❗ Current head 45e5604 differs from pull request most recent head 0cc670e. Consider uploading reports for the commit 0cc670e to get more accurate results

Additional details and impacted files
@@          Coverage Diff           @@
##             main    #191   +/-   ##
======================================
  Coverage        ?   0.76%           
======================================
  Files           ?      15           
  Lines           ?    1559           
  Branches        ?     336           
======================================
  Hits            ?      12           
  Misses          ?    1547           
  Partials        ?       0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@rgaudin rgaudin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good ; there's the workflow thing and then I think it's OK but you'll have to fix the git history

Reviewed 9 of 11 files at r4, all commit messages.
Reviewable status: 92 of 94 files reviewed, 7 unresolved discussions (waiting on @benoit74)


.pre-commit-config.yaml line 18 at r4 (raw file):

  - id: ruff
- repo: https://github.com/RobertCraigie/pyright-python
  rev: v1.1.315

Is this in line with pyproject?

Copy link
Member

@rgaudin rgaudin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 11 files at r4.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @benoit74)

@benoit74 benoit74 requested a review from rgaudin August 18, 2023 12:57
Copy link
Collaborator Author

@benoit74 benoit74 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you want to fix in the git history?

I used git mv, but indeed I realized that it makes no difference : https://stackoverflow.com/questions/2314652/is-it-possible-to-move-rename-files-in-git-and-maintain-their-history

But I agree that I could squash most commits

What is the workflow thing?

Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @benoit74 and @rgaudin)


.pre-commit-config.yaml line 18 at r4 (raw file):

Previously, rgaudin (rgaudin) wrote…

Is this in line with pyproject?

Done.

Copy link
Member

@rgaudin rgaudin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Workflow thing was the pyright version.

Regarding the git mv, I realized my alias did not have the --folow param! Disregard the message

Reviewed 1 of 1 files at r5, all commit messages.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @benoit74)

@benoit74 benoit74 merged commit 927af14 into main Aug 18, 2023
5 checks passed
@benoit74 benoit74 deleted the adopt_python_bootstrap branch August 18, 2023 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adopt new Python bootstrap conventions
2 participants