You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've integrated unlighthouse with a Scrapy crawler I currently use – instead of using unlighthouse's crawling function, I'm feeding it a list of URLs from Scrapy, ranging from 200 to 10,000 URLs. My approach follows the advice in the docs section about Manually Providing URLs.
My first question is whether i'm still leveraging unlighthouse's efficiency when I bypass the crawling feature?
Since I'm passing exact URLs, I'm not sure if route sampling would still apply. Would it be better to convert my URL list into relative path folders with regex rules and then still use unlighthouse's crawler?
The main goal is to replace an existing python script I wrote to directly query the lighthouse api, which works, but isn't performant at all.
I know this isn't really the intended use case, so any pointers or insights you can provide would be greatly appreciated!
Additionally, My second question is, if it's possible to configure the csvExpanded to include even more information per url? It seems that the UI crawl results capture more information.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hey y'all!
I've integrated unlighthouse with a Scrapy crawler I currently use – instead of using unlighthouse's crawling function, I'm feeding it a list of URLs from Scrapy, ranging from 200 to 10,000 URLs. My approach follows the advice in the docs section about Manually Providing URLs.
My first question is whether i'm still leveraging unlighthouse's efficiency when I bypass the crawling feature?
Since I'm passing exact URLs, I'm not sure if route sampling would still apply. Would it be better to convert my URL list into relative path folders with regex rules and then still use unlighthouse's crawler?
The main goal is to replace an existing python script I wrote to directly query the lighthouse api, which works, but isn't performant at all.
I know this isn't really the intended use case, so any pointers or insights you can provide would be greatly appreciated!
Additionally, My second question is, if it's possible to configure the csvExpanded to include even more information per url? It seems that the UI crawl results capture more information.
Beta Was this translation helpful? Give feedback.
All reactions