Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Bitcoin Book Srapper #84

Open
elraphty opened this issue Oct 22, 2024 · 1 comment
Open

Fix Bitcoin Book Srapper #84

elraphty opened this issue Oct 22, 2024 · 1 comment

Comments

@elraphty
Copy link
Contributor

Currently the bitcoin book scrapper does not work, the urls do not exists and beautiful soup cannot scrape the chapters of the book.

  • Write a fix to enable the bitcoin book scrapper work with the new version of the book on Github
  • Update the readme with the changes
@kouloumos
Copy link
Contributor

Good point. As you can see in sources, we last scraped the Bitcoin Book (Mastering Bitcoin) a year ago. Normally, this wouldn’t be an issue since books aren’t frequently updated, but the new edition has been released since then.

The chapters were indexed as separate documents during the last scrape (#46), which aligns with the chunking concept we've been discussing. While we successfully broke the content into chunks, there isn’t yet a system in place to connect these chunks meaningfully. If we decide to adopt the chunking strategy moving forward, this would be a good opportunity to implement it effectively for the book as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants