Only scan for links on html/textual data #106

bensheldon · 2024-11-04T21:36:13Z

Thanks for this gem 🙇🏻

I noticed that the crawler will attempt to scan everything for links, including images:

Line 72 in 0d809fc

Utils.scan_for_links(response.body) do |path|

I noticed this because for some strange reason I have a PNG on my blog that scans positively for a blank URI, which then raises a bad URI(is not URI?): nil (URI::InvalidURIError)

This is the image: https://island94.org/uploads/2009-09-13-My-featured-dead-cockroach/wired-cockroach.png

The text was updated successfully, but these errors were encountered:

benpickles · 2024-11-05T23:05:28Z

Ah yes I see…

Having some sort of mapping between mime type and how to handle the response’s contents sounds super sensible - I guess I haven’t fully encountered the need before. If it was more capable it could also cover other cases like #95 - though that part feels more than one step away.

Thanks for creating the issue, it makes me so happy to know that someone else is using it!

bensheldon linked a pull request Dec 26, 2024 that will close this issue

Skip a elements without an href attribute #107

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only scan for links on html/textual data #106

Only scan for links on html/textual data #106

bensheldon commented Nov 4, 2024

benpickles commented Nov 5, 2024

Only scan for links on html/textual data #106

Only scan for links on html/textual data #106

Comments

bensheldon commented Nov 4, 2024

benpickles commented Nov 5, 2024