Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated the Codebase to use pathlib instead of using os.path and path.py #208

Closed
wants to merge 6 commits into from

Conversation

MUCCHU
Copy link

@MUCCHU MUCCHU commented Dec 20, 2023

Fix #195

Make sure consistency is maintained by using pathlib.Path to handle all file operations.

After the changes I ran the docker container which worked perfectly fine.

Copy link

codecov bot commented Dec 21, 2023

Codecov Report

Attention: 89 lines in your changes are missing coverage. Please review.

Comparison is base (2fd98ed) 0.76% compared to head (31eed63) 0.77%.

Files Patch % Lines
src/gutenberg2zim/export.py 0.00% 31 Missing ⚠️
src/gutenberg2zim/download.py 0.00% 25 Missing ⚠️
src/gutenberg2zim/urls.py 0.00% 20 Missing ⚠️
src/gutenberg2zim/rdf.py 0.00% 4 Missing ⚠️
src/gutenberg2zim/utils.py 0.00% 3 Missing ⚠️
src/gutenberg2zim/entrypoint.py 0.00% 2 Missing ⚠️
src/gutenberg2zim/s3.py 0.00% 2 Missing ⚠️
src/gutenberg2zim/zim.py 0.00% 2 Missing ⚠️
Additional details and impacted files
@@          Coverage Diff          @@
##            main    #208   +/-   ##
=====================================
  Coverage   0.76%   0.77%           
=====================================
  Files         15      15           
  Lines       1559    1556    -3     
  Branches     336     336           
=====================================
  Hits          12      12           
+ Misses      1547    1544    -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@benoit74 benoit74 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, this is a very good first PR.

Which docker command did you ran? (it is pretty hard to run a full scrape and be sure that you exercise all code branches, at least it takes time)

I made some inline comments.

In addition to them, I would like:

  • please choose between the / operator or the joinpath function to combine paths, for consistency. Here you use both, and I find it does not help to read the code. Except if there are any good reasons to sometime use one and sometime prefer the other, but this is not obvious for me
  • please add an entry to the ChangeLog in the Unreleased section (you will probably need a new "Changed" subsection as well, there is none for now)

Some other remarks (this is not something to worry too much about, this is just our way of working):

  • please always add the "Fix: #xxx" text in your first comment, so that the issue is automatically closed when PR is merged
  • please assign a reviewer once PR is ready (when in doubt, ask who is the right person, but I'm the reviewer of all openZIM Python scrapers) ; this is the trigger for my action, no need to notify anywhere else
  • PR is not ready if CI is failing ; if it is not clear for you why CI is failing, ask for help
    • this is more a general remark, because here it was hard to understand what was going on, there was a nasty issue in main branch, so I fixed it myself
  • much more details on all this in https://github.com/openzim/overview/wiki/openZIM-workflow

@benoit74
Copy link
Collaborator

Nota: black formatting is not OK as well ; you should probably:

  • create a virtual environment with all dev dependencies with hatch shell (if not already done)
  • activate pre-commit (pre-commit install in base repo folder) to detect such issues at commit time
  • you can fix most of these issues with invoke fixall (in base repo folder again)

…ct repeatedly. Changed snippets where src, dst, opff, download_cache are converted to Path objects. Removed unused import 'os' from entrypoint.py
…te paths maintaining consistency. Removed unused import 'os' from urls.py and rdf.py. Changed elif condition that checks if parser.title is empty string coz ruff gave error
@MUCCHU MUCCHU requested a review from benoit74 December 21, 2023 18:47
@MUCCHU
Copy link
Author

MUCCHU commented Dec 21, 2023

Thank you for the feedback. I made all the changes you requested and also ensured that the formatting is perfect by enabling pre-commit. Kindly review the PR.

@benoit74
Copy link
Collaborator

I really like what you did, good job.
Just give me some time to start an end-to-end test and I will come back to you.

@benoit74
Copy link
Collaborator

Still did not had time to start a test run, sorry about that.

Copy link
Collaborator

@benoit74 benoit74 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please ensure that a full run (e.g. python src/gutenberg2zim) passes before requesting again a review.

Ziming a single book does not work:

python src/gutenberg2zim --books 84

@benoit74
Copy link
Collaborator

Closed for inactivity, feel free to reopen if you have new changes to submit.

@benoit74 benoit74 closed this Jan 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove usage of os.path and path.py, prefer pathlib
2 participants