Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logging & parameterization improvements #15

Merged
merged 10 commits into from
Aug 15, 2024

Conversation

thashepherd
Copy link
Contributor

@thashepherd thashepherd commented Aug 15, 2024

The sheer verbosity of scrapy's default logging was cramping my dev style, so I changed it. This PR also includes a few other tweaks.

  • Default log_level for built-in scrapy modules to WARN while retaining it at INFO for our own code (this was a PITA)
  • start_date and open_in_browser may be passed on the command line: python main.py --start_date '08/01/2023'
  • docker-compose will read mailtrap (and other) env vars from .env.dev. In other words, if you create .env.dev in your repo root with
MAILTRAP_API_TOKEN=<your_api_token>
MAILTRAP_SENDER_ADDRESS=mailtrap@codefordayton.org
MAILTRAP_TO_ADDRESS=<your_email>
MAILTRAP_CC_ADDRESS=
MAILTRAP_BCC_ADDRESS=

and then run docker-compose up --build, those variables will be loaded in. .env.dev is listed in .gitignore so there's no risk of accidentially committing it.

  • Not supplying MAILTRAP_CC_ADDRESS or MAILTRAP_BCC_ADDRESS no longer causes an error
  • start_date and permit_types are passed to Jinja, and the template's been tweaked to use them: image

Output from a run with default arguments:

[+] Running 1/0
 ✔ Container demolition_checker-demolition-checker-1  Recreated                                                                                                                                                                                         0.0s 
Attaching to demolition-checker-1
demolition-checker-1  | 2024-08-15 19:36:24 [DemolitionSpider] INFO: Initializing spider with permit_type Building/Wrecking/Commercial/NA and start_date 08/14/2024
demolition-checker-1  | 2024-08-15 19:36:26 [DemolitionSpider] INFO: Extracted 2 records
demolition-checker-1  | 2024-08-15 19:36:26 [DemolitionSpider] INFO: 
demolition-checker-1  |         Building/Wrecking/Commercial/NA:                                                                                                                                                                                             
demolition-checker-1  |         WRK2024C-00017 - BLADECUTTERS - 1745 WOODMAN, DAYTON OH                                                                                                                                                                      
demolition-checker-1  | 2024-08-15 19:36:26 [DemolitionSpider] INFO:                                                                                                                                                                                         
demolition-checker-1  |         Building/Wrecking/Commercial/NA:
demolition-checker-1  |         WRK2024C-00016 -  - 1745 WOODMAN DR, DAYTON OH                                                                                                                                                                               
demolition-checker-1  | 2024-08-15 19:36:26 [__main__] INFO: Spider closed: DemolitionSpider, reason: finished                                                                                                                                               
demolition-checker-1  | 2024-08-15 19:36:26 [__main__] INFO: Collected 2 records, total: 2                                                                                                                                                                   
demolition-checker-1  | 2024-08-15 19:36:26 [DemolitionSpider] INFO: Initializing spider with permit_type Building/Wrecking/Residential/NA and start_date 08/14/2024                                                                                         
demolition-checker-1  | 2024-08-15 19:36:27 [DemolitionSpider] INFO: No records found
demolition-checker-1  | 2024-08-15 19:36:27 [__main__] INFO: Spider closed: DemolitionSpider, reason: finished
demolition-checker-1  | 2024-08-15 19:36:27 [__main__] INFO: Collected 0 records, total: 2                                                                                                                                                                   
demolition-checker-1  | 2024-08-15 19:36:27 [__main__] INFO: All records: 2                                                                                                                                                                                  
demolition-checker-1 exited with code 0

Note that this change does NOT fix #11.

If you're tired of seeing tiny formatting changes in reviews like this one:

pip install -r requirements-dev.txt

...will install pre-commit which'll run ruff and otherwise enforce a consistent standard.

Comment on lines -1 to -39
attrs==24.2.0
Automat==22.10.0
certifi==2024.7.4
cffi==1.17.0
charset-normalizer==3.3.2
constantly==23.10.4
cryptography==43.0.0
cssselect==1.2.0
defusedxml==0.7.1
filelock==3.15.4
hyperlink==21.0.0
idna==3.7
incremental==24.7.2
itemadapter==0.9.0
itemloaders==1.3.1
jmespath==1.0.1
lxml==5.2.2
packaging==24.1
parsel==1.9.1
Protego==0.3.1
pyasn1==0.6.0
pyasn1_modules==0.4.0
pycparser==2.22
PyDispatcher==2.0.7
pyOpenSSL==24.2.1
queuelib==1.7.0
requests==2.32.3
requests-file==2.1.0
Jinja2==3.1.4
mailtrap==2.0.1
Scrapy==2.11.2
service-identity==24.1.0
six==1.16.0
tldextract==5.1.2
Twisted==24.3.0
typing_extensions==4.12.2
urllib3==2.2.2
Twisted==24.7.0
w3lib==2.2.1
zope.interface==7.0.1
mailtrap==2.0.1
jinja2==3.0.3
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you think this looks funny - I used https://github.com/fpgmaas/deptry to separate our direct dependencies (Scrapy, Twisted) from indirect dependencies (charset-normalizer, certifi) that were pulled in automatically.

One (of many) of the great things about using a dependency manager like Poetry (or Hatch or Rye) is that it separates your direct dependencies in pyproject.toml from indirect dependencies, which are stored in a separate poetry.lock. While converting this project to poetry would be quick, I don't want to single-handedly inflict a change on your personal development process by doing so.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm game for switching to poetry. pip is just muscle memory for me at this point - I rarely even consider that package management has improved beyond the requirements.txt file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to conduct that switch in a future PR if you want to see what that looks like. I don't know if it's worth sacrificing you & the other contributors' muscle memory for that on a project like this, though. Up to you!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works for me!

"--start_date",
required=False,
default=(date.today() - timedelta(days=1)).strftime("%m/%d/%Y"),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change this to weeks = 1 in order to cause the recreational spider to hit #11

@DavidEBest
Copy link
Contributor

:shipit:

@DavidEBest DavidEBest merged commit b5a6842 into codefordayton:main Aug 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Handle single result case
2 participants