Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于爬虫采集小说与前后台页面响应的互相作用! #60

Open
skydiy opened this issue Aug 14, 2020 · 6 comments
Open

关于爬虫采集小说与前后台页面响应的互相作用! #60

skydiy opened this issue Aug 14, 2020 · 6 comments

Comments

@skydiy
Copy link

skydiy commented Aug 14, 2020

目前发现如果后台采集任务(更新小说任务)频繁的话,前后台的页面响应速度会非常不稳定!
目前个人想法是把爬虫部分分离出独立项目或进程,这样能增强整体稳定性,不知道作者有没有想到这方面的问题!
目前是开发机的桌面环境,采用win10系统,进程控制台可以看到非常繁忙的更新小说任务的日志输出,在这样的场景下目前发现能达到最高6s以上的页面访问延迟!

@skydiy
Copy link
Author

skydiy commented Aug 14, 2020

这是大概的访问统计情况!

requestUrl method times used max used min used avg used
/public/home/default/css/style.css GET 6 15.00ms 7.00ms 0 2.50ms
/public/home/default/img/logo-2.png GET 6 4.01ms 1.00ms 0 667.62us
/public/plugin/layui/css/layui.css GET 1 4.00ms 4.00ms 4.00ms 4.00ms
/public/plugin/layui/lay/modules/layer.js GET 1 0 0 0 0
/public/home/default/img/sprite.png GET 6 7.00ms 4.00ms 0 1.17ms
/public/home/default/img/bookshelf-book-bg.png GET 6 3.00ms 2.00ms 0 500.27us
/public/css/admin/x-admin.css GET 1 999.50us 999.50us 999.50us 999.50us
/public/plugin/layui/css/modules/layer/default/layer.css GET 1 999.80us 999.80us 999.80us 999.80us
/public/plugin/layui/lay/modules/element.js GET 1 0 0 0 0
/book/detail GET 6 6.46s 5.05s 93.12ms 1.08s

@vckai
Copy link
Owner

vckai commented Aug 22, 2020

可以考虑抽离出来,提供API的方式插入

@skydiy
Copy link
Author

skydiy commented Aug 30, 2020

我后来跟踪了一下 包括数据库 发现是与磁盘IO性能有关,当然独立之后也相对在一起要好一些!这样耦合在一起 在读写数据库时都集中到一起了!

@BeanWei
Copy link

BeanWei commented Jan 15, 2021

这个思路可行不:
1: 采集任务的时间段放在站点活跃度低的情况下进行
2: 采集数据存放在redis / mongo 中,完成后批量同步到mysql。

抽离出来并不能解决根本问题吧,除非分布式,分库

@skydiy
Copy link
Author

skydiy commented Jan 17, 2021

也是呢,主要是IO占用过高 你的思路可以的 mogngo好像也会碰到IO问题,redis比较合适吧!!!毕竟是内存操作,批量的读写速度还是很快的!

@skydiy
Copy link
Author

skydiy commented Jan 17, 2021

采用redis后 普通的数据查询走MySQL 采集先走redis 把采集的大量写对MySQL的影响屏蔽掉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants