-
Notifications
You must be signed in to change notification settings - Fork 246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added shared drive crawler #141
Conversation
Idea: log in a file each file, even if not downloaded |
To add to this idea, since we talked about it in person, the goal is to keep track of what the crawler is seeing, so we can further improve the default match / ignore regexes later. |
The PR is ready for another review. I added a new logger that is not from the root logger, so we could log in another file without messing with STDOUT. This new file keeps track of every file we crawled. |
Oops commented on the commet instead of the PR: Besides the exception() log TODO, LGTM! |
Sorry, but I'm not too sure I understand whats wrong @Res260 , could you please elaborate a bit? |
I think I saw you added a log.exception() somewhere else in the code, so I
thought it would be easy to implement instead of leaving a TODO in the code.
…On Wed., Aug. 14, 2019, 13:16 Maxime Carbonneau, ***@***.***> wrote:
Sorry, but I'm not too sure I understand whats wrong @Res260
<https://github.com/Res260> , could you please elaborate a bit?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#141?email_source=notifications&email_token=ADPMNL3QRDF5KDJYIF6FRYLQEQ4WPA5CNFSM4IITPNLKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4JPPEQ#issuecomment-521336722>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADPMNLYIG64XYKMPOBRMKRTQEQ4WPANCNFSM4IITPNLA>
.
|
You're right, I totally missed this one, my bad! Fixed in latest commit. |
Lgtm |
Fixed a small problem with the parser where the last line of each file got truncated, and simplified the match file since |
TODO: Add the pr changes to changelog.md |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested, very dank. LGTM
I added the automatic shared drive crawler I had in mind, as shared a little in #137.
This has been done after #140, so its been easier to implement since I knew the download flow and quirks.
I had to create a crawler_config folder since I didn't like the idea of mixing classes definition and plaintext files.
I tried to put more documentation in this class since the core logic may be hard to follow. I've tested it a lot and it works amazingly well, but please do test it.