We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
当我在爬取 目标站点: https://19.offcn.com/ 的时候有些链接爬不全,关闭无头模式打开 chrome 开发者工具时发现有访问失败的请求再不断地重复提交导致其他请求被阻塞,导致整个页面渲染超时而导致漏爬
https://19.offcn.com/
正常打开时的控制台
crawlergo 无头模式爬取的控制台
版本
551acb2b75403985493b56414d797ce5a1da480f
1.39.122 Chromium: 102.0.5005.115 (正式版本) (arm64)
执行的命令
./crawlergo -m 2 -c **** --no-headless https://19.offcn.com/
<DIV>
<div class="zg_personal already_login" style="display: none"> <p class="zg_personalP"><strong><img src=""/></strong><i></i></p> <div class="zg_person_list" style="display: none;"> <em> </em> <a href="/mycourse/index/">我的课程</a> <a href="/svipcourse/">学员专享</a> <a href="/orders/myorders/">我的订单</a> <a href="/mycoupon/index/">我的优惠券</a> <a href="/user/index/">账号设置</a> <a href="/foreuser/outlogin/">退出登录</a> </div> </div>
正常打开可以快速加载完成,使用 crawlergo 加载时间太长,这是个 bug 吗,已关闭了所有代理
The text was updated successfully, but these errors were encountered:
crawlergo默认会阻断图片的请求,减少静态资源访问,现在看来这个会导致页面异常
Sorry, something went wrong.
是否可以这样做,我测试是可行的,但不知道有没有其他后果:
func (tab *Tab) Start() { // ... if err := chromedp.Run(*tab.Ctx, RunWithTimeOut(tab.Ctx, tab.config.DomContentLoadedTimeout, chromedp.Tasks{ //.... // 在这里进行阻断 network.SetBlockedURLS(config.StaticSuffix), //.... // 执行导航 chromedp.Navigate(tab.NavigateReq.URL.String()), }),
func (tab *Tab) InterceptRequest(v *fetch.EventRequestPaused) { //... // 删除此处逻辑 // 静态资源 全部阻断 // https://github.com/Qianlitp/crawlergo/issues/106 // if config.StaticSuffixSet.Contains(url.FileExt()) { // _ = fetch.FailRequest(v.RequestID, network.ErrorReasonBlockedByClient).Do(ctx) // req.Source = config.FromStaticRes // tab.AddResultRequest(req) // return // } //... }
No branches or pull requests
问题描述
当我在爬取 目标站点:
https://19.offcn.com/
的时候有些链接爬不全,关闭无头模式打开 chrome 开发者工具时发现有访问失败的请求再不断地重复提交导致其他请求被阻塞,导致整个页面渲染超时而导致漏爬正常打开时的控制台
crawlergo 无头模式爬取的控制台
复现步骤
版本
551acb2b75403985493b56414d797ce5a1da480f
1.39.122 Chromium: 102.0.5005.115 (正式版本) (arm64)
执行的命令
./crawlergo -m 2 -c **** --no-headless https://19.offcn.com/
期望表现
实际表现
<DIV>
没有渲染而漏抓正常打开可以快速加载完成,使用 crawlergo 加载时间太长,这是个 bug 吗,已关闭了所有代理
The text was updated successfully, but these errors were encountered: