-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Logging & parameterization improvements #15
Conversation
…start date and open_in_browser from command line. Pass list of permit types to crawler. Set built-in scrapy module logs to WARN. Improve error handling around mailtrap env vars.
Integrate changes from base
attrs==24.2.0 | ||
Automat==22.10.0 | ||
certifi==2024.7.4 | ||
cffi==1.17.0 | ||
charset-normalizer==3.3.2 | ||
constantly==23.10.4 | ||
cryptography==43.0.0 | ||
cssselect==1.2.0 | ||
defusedxml==0.7.1 | ||
filelock==3.15.4 | ||
hyperlink==21.0.0 | ||
idna==3.7 | ||
incremental==24.7.2 | ||
itemadapter==0.9.0 | ||
itemloaders==1.3.1 | ||
jmespath==1.0.1 | ||
lxml==5.2.2 | ||
packaging==24.1 | ||
parsel==1.9.1 | ||
Protego==0.3.1 | ||
pyasn1==0.6.0 | ||
pyasn1_modules==0.4.0 | ||
pycparser==2.22 | ||
PyDispatcher==2.0.7 | ||
pyOpenSSL==24.2.1 | ||
queuelib==1.7.0 | ||
requests==2.32.3 | ||
requests-file==2.1.0 | ||
Jinja2==3.1.4 | ||
mailtrap==2.0.1 | ||
Scrapy==2.11.2 | ||
service-identity==24.1.0 | ||
six==1.16.0 | ||
tldextract==5.1.2 | ||
Twisted==24.3.0 | ||
typing_extensions==4.12.2 | ||
urllib3==2.2.2 | ||
Twisted==24.7.0 | ||
w3lib==2.2.1 | ||
zope.interface==7.0.1 | ||
mailtrap==2.0.1 | ||
jinja2==3.0.3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you think this looks funny - I used https://github.com/fpgmaas/deptry to separate our direct dependencies (Scrapy, Twisted) from indirect dependencies (charset-normalizer, certifi) that were pulled in automatically.
One (of many) of the great things about using a dependency manager like Poetry (or Hatch or Rye) is that it separates your direct dependencies in pyproject.toml
from indirect dependencies, which are stored in a separate poetry.lock
. While converting this project to poetry would be quick, I don't want to single-handedly inflict a change on your personal development process by doing so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm game for switching to poetry. pip
is just muscle memory for me at this point - I rarely even consider that package management has improved beyond the requirements.txt
file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to conduct that switch in a future PR if you want to see what that looks like. I don't know if it's worth sacrificing you & the other contributors' muscle memory for that on a project like this, though. Up to you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That works for me!
"--start_date", | ||
required=False, | ||
default=(date.today() - timedelta(days=1)).strftime("%m/%d/%Y"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change this to weeks = 1
in order to cause the recreational spider to hit #11
The sheer verbosity of scrapy's default logging was cramping my dev style, so I changed it. This PR also includes a few other tweaks.
start_date
andopen_in_browser
may be passed on the command line:python main.py --start_date '08/01/2023'
.env.dev
. In other words, if you create.env.dev
in your repo root withand then run
docker-compose up --build
, those variables will be loaded in..env.dev
is listed in.gitignore
so there's no risk of accidentially committing it.MAILTRAP_CC_ADDRESS
orMAILTRAP_BCC_ADDRESS
no longer causes an errorstart_date
andpermit_types
are passed to Jinja, and the template's been tweaked to use them:Output from a run with default arguments:
Note that this change does NOT fix #11.
If you're tired of seeing tiny formatting changes in reviews like this one:
...will install pre-commit which'll run
ruff
and otherwise enforce a consistent standard.