-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance Scraper #64
Enhance Scraper #64
Conversation
- custom 'id' for each document - Update the document based on custom 'id' in ES with provided fields and keep the existing fields as it is - convert code from JavaScript to Python - remove redundant code
- update custom url for domain field - changes cron job to be run manually for inactive mailing lists
- migrate from Node.js to Python
- migrate from Node.js to Python
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed the new logic for Bitcoin Transcripts, and it looks good overall! I left a couple of comments inline.
Unfortunately, I still can't test this locally due to the "ModuleNotFoundError: No module named 'common'" error I mentioned in our DMs.
Feature idea:
It would be extremely useful if all the scripts had a test mode to execute the logic without involving the Elasticsearch index. Perhaps exporting the docs in JSON could make it easier to test without the Elasticsearch dependency. What do you think?
- optimizing the script - migrate from Node.js to Python
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me and I think it can be merged iff it has been tested by @urvishp80.
As I previously said, I have difficulty testing this locally. The difficulty of testing locally because of coupling it so tightly with ES index has also been raised by @Emmanuel-Develops on discord. It's okay to merge this, and I will open an issue for dealing with locally running the scraper, what do you think @urvishp80 ?
Also, there are additional changes that needs to be done on the bitcoin transcripts scraper to allow filtering for AI-generated transcripts. As this PR contains a lot of changes and refactors, I think it's better to do that later alongside some other changes that I have in mind (I'll open an issue for these also).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice additions!
closes #67 |
closes #68 |
Bitcoin Transcripts:
This update will account for edits to transcripts and will handle this issue. Handling Edits and AI-Generated Transcripts in Bitcoin Transcripts Repository
Includes:
Mailing Lists:
Bitcoin Talk:
Bitcoin Optech: