Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

readability drops some <h5> content #65

Open
clockfly opened this issue Jul 7, 2024 · 0 comments
Open

readability drops some <h5> content #65

clockfly opened this issue Jul 7, 2024 · 0 comments

Comments

@clockfly
Copy link

clockfly commented Jul 7, 2024

link:

https://webscraping.ai/faq/colly/how-do-i-handle-redirects-in-colly

after paring

...

 adjusting Colly's redirect handling settings, you can ensure that your web scraper behaves exactly as needed when encountering redirects during the scraping process.

            Related Questions
  !!!!!!!! HERE SOME h5 part text is missing !!!!!!!!

missed part:

seems h5 tag is not kept after readability parsing.

<h2 class="mt-5">Related Questions</h2>

      <div class="card mb-3">
        <div class="card-body">
          <h5 class="card-title mb-0"><a href="/faq/colly/how-can-i-integrate-colly-with-a-database-to-store-scraped-data">How can I integrate Colly with a database to store scraped data?</a></h5>
        </div>
      </div>
      <div class="card mb-3">
        <div class="card-body">
          <h5 class="card-title mb-0"><a href="/faq/colly/is-it-possible-to-scrape-images-or-files-with-colly">Is it possible to scrape images or files with Colly?</a></h5>
        </div>
      </div>
      <div class="card mb-3">
        <div class="card-body">
          <h5 class="card-title mb-0"><a href="/faq/colly/how-do-i-use-colly-s-callback-functions-effectively">How do I use Colly&#39;s callback functions effectively?</a></h5>
        </div>
      </div>
</div>
image

expected behavior

expect h5 tag to be kept.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant