Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

电犀牛R66S使用本固件时有时无法识别第二个RTL8125网卡 #1061

Closed
jysqice opened this issue Feb 14, 2023 · 41 comments
Closed
Labels
documentation Improvements or additions to documentation support This need is supported

Comments

@jysqice
Copy link

jysqice commented Feb 14, 2023

电犀牛R66s 2G版
Linux armbian 6.1.10-flippy-81+

问题如题目,出现问题时dmesg和lspci均没有0002:21:00.0的网卡信息,经过数千次重启测试,发现CPU温度低于40度时容易出现,CPU高于50度时完全不出现。使用“出厂固件”时无此问题

我的猜测:cpu的温控机制使得低温时运算速度过快,扫描pcie设备时未等到pcie设备响应即结束而导致了该问题出现
如果以上猜测属实,那么增加pcie扫描延时或者调整cpu温控机制均可以解决该问题

@ophub
Copy link
Owner

ophub commented Feb 14, 2023

你指的第二个网卡,是外壳上标注为LAN的还是WAN的?

@jysqice
Copy link
Author

jysqice commented Feb 14, 2023

WAN

@ophub
Copy link
Owner

ophub commented Feb 14, 2023

我测试下

@jysqice
Copy link
Author

jysqice commented Feb 14, 2023

想要使问题重现,可以用风扇对着吹

@jysqice
Copy link
Author

jysqice commented Feb 14, 2023

2023-02-13 10:07:52 armbian booted with 1 RTL8125 and cpu temp is 37222 (error)
2023-02-13 10:08:46 armbian booted with 2 RTL8125 and cpu temp is 38888
2023-02-13 10:09:34 armbian booted with 1 RTL8125 and cpu temp is 38333 (error)
2023-02-13 10:10:19 armbian booted with 1 RTL8125 and cpu temp is 36666 (error)
2023-02-13 10:11:11 armbian booted with 2 RTL8125 and cpu temp is 38333
2023-02-13 10:12:40 armbian booted with 2 RTL8125 and cpu temp is 38888
2023-02-13 10:14:15 armbian booted with 2 RTL8125 and cpu temp is 39444
2023-02-13 10:15:00 armbian booted with 1 RTL8125 and cpu temp is 36111 (error)
2023-02-13 10:15:48 armbian booted with 1 RTL8125 and cpu temp is 38333 (error)
2023-02-13 10:16:43 armbian booted with 2 RTL8125 and cpu temp is 38333
2023-02-13 10:17:38 armbian booted with 2 RTL8125 and cpu temp is 39444
2023-02-13 10:18:23 armbian booted with 1 RTL8125 and cpu temp is 35000 (error)
2023-02-13 10:19:14 armbian booted with 1 RTL8125 and cpu temp is 38888 (error)
2023-02-13 10:20:01 armbian booted with 1 RTL8125 and cpu temp is 36666 (error)
2023-02-13 10:20:48 armbian booted with 1 RTL8125 and cpu temp is 37222 (error)
2023-02-13 10:21:38 armbian booted with 2 RTL8125 and cpu temp is 38333
2023-02-13 10:22:28 armbian booted with 2 RTL8125 and cpu temp is 39444
2023-02-13 10:24:00 armbian booted with 2 RTL8125 and cpu temp is 38333
2023-02-13 10:24:47 armbian booted with 2 RTL8125 and cpu temp is 39444
2023-02-13 10:25:42 armbian booted with 2 RTL8125 and cpu temp is 38888
2023-02-13 10:26:30 armbian booted with 1 RTL8125 and cpu temp is 38888 (error)
2023-02-13 10:28:05 armbian booted with 2 RTL8125 and cpu temp is 40000
2023-02-13 10:28:54 armbian booted with 2 RTL8125 and cpu temp is 37222
2023-02-13 10:29:48 armbian booted with 2 RTL8125 and cpu temp is 39444
2023-02-13 10:30:33 armbian booted with 2 RTL8125 and cpu temp is 36666
2023-02-13 10:31:23 armbian booted with 2 RTL8125 and cpu temp is 36666
2023-02-13 10:32:08 armbian booted with 1 RTL8125 and cpu temp is 36111 (error)
2023-02-13 10:32:43 armbian booted with 2 RTL8125 and cpu temp is 36666
2023-02-13 10:33:38 armbian booted with 1 RTL8125 and cpu temp is 40625 (error)
2023-02-13 10:34:22 armbian booted with 1 RTL8125 and cpu temp is 37222 (error)
2023-02-13 10:35:11 armbian booted with 2 RTL8125 and cpu temp is 34375
2023-02-13 10:36:06 armbian booted with 2 RTL8125 and cpu temp is 39444
2023-02-13 10:36:54 armbian booted with 2 RTL8125 and cpu temp is 40000
2023-02-13 10:37:41 armbian booted with 2 RTL8125 and cpu temp is 38333
2023-02-13 10:39:10 armbian booted with 2 RTL8125 and cpu temp is 36666
2023-02-13 10:39:59 armbian booted with 2 RTL8125 and cpu temp is 38333
2023-02-13 10:40:47 armbian booted with 1 RTL8125 and cpu temp is 36666 (error)
2023-02-13 10:41:37 armbian booted with 1 RTL8125 and cpu temp is 40000 (error)
2023-02-13 10:42:25 armbian booted with 2 RTL8125 and cpu temp is 35000
2023-02-13 10:43:14 armbian booted with 2 RTL8125 and cpu temp is 36111
2023-02-13 10:44:01 armbian booted with 1 RTL8125 and cpu temp is 36666 (error)
2023-02-13 10:44:49 armbian booted with 1 RTL8125 and cpu temp is 36666 (error)
2023-02-13 10:45:42 armbian booted with 2 RTL8125 and cpu temp is 33125
2023-02-13 10:46:33 armbian booted with 2 RTL8125 and cpu temp is 35000
2023-02-13 10:47:26 armbian booted with 2 RTL8125 and cpu temp is 38888
2023-02-13 10:48:14 armbian booted with 1 RTL8125 and cpu temp is 36666 (error)
2023-02-13 10:49:03 armbian booted with 2 RTL8125 and cpu temp is 37777
2023-02-13 10:49:59 armbian booted with 2 RTL8125 and cpu temp is 38333
2023-02-13 10:51:32 armbian booted with 2 RTL8125 and cpu temp is 36666
2023-02-13 10:52:19 armbian booted with 1 RTL8125 and cpu temp is 36666 (error)
2023-02-13 10:53:09 armbian booted with 1 RTL8125 and cpu temp is 39444 (error)
2023-02-13 10:54:00 armbian booted with 2 RTL8125 and cpu temp is 40000
2023-02-13 10:54:47 armbian booted with 2 RTL8125 and cpu temp is 40000
2023-02-13 10:55:42 armbian booted with 1 RTL8125 and cpu temp is 39444 (error)
2023-02-13 10:56:34 armbian booted with 2 RTL8125 and cpu temp is 38333
2023-02-13 10:57:22 armbian booted with 2 RTL8125 and cpu temp is 38888
2023-02-13 10:58:14 armbian booted with 2 RTL8125 and cpu temp is 38888
2023-02-13 10:59:01 armbian booted with 2 RTL8125 and cpu temp is 39444
2023-02-13 11:00:30 armbian booted with 1 RTL8125 and cpu temp is 37222 (error)

我用脚本生成的记录

@jysqice
Copy link
Author

jysqice commented Feb 14, 2023

2023-02-13 11:15:23 armbian booted with 2 RTL8125 and cpu temp is 48333
2023-02-13 11:16:51 armbian booted with 2 RTL8125 and cpu temp is 45555
2023-02-13 11:17:41 armbian booted with 2 RTL8125 and cpu temp is 52500
2023-02-13 11:18:25 armbian booted with 2 RTL8125 and cpu temp is 52500
2023-02-13 11:19:15 armbian booted with 2 RTL8125 and cpu temp is 53125
2023-02-13 11:20:02 armbian booted with 2 RTL8125 and cpu temp is 53750
2023-02-13 11:23:03 armbian booted with 2 RTL8125 and cpu temp is 55555
2023-02-13 11:23:50 armbian booted with 2 RTL8125 and cpu temp is 53750
2023-02-13 11:24:37 armbian booted with 2 RTL8125 and cpu temp is 55555
2023-02-13 11:25:28 armbian booted with 2 RTL8125 and cpu temp is 59444
2023-02-13 11:27:01 armbian booted with 2 RTL8125 and cpu temp is 57777
2023-02-13 11:29:41 armbian booted with 2 RTL8125 and cpu temp is 46111
2023-02-13 11:30:32 armbian booted with 2 RTL8125 and cpu temp is 45555
2023-02-13 11:31:22 armbian booted with 2 RTL8125 and cpu temp is 41250

不用风扇基本正常

@jysqice
Copy link
Author

jysqice commented Feb 14, 2023

#!/bin/bash
HOSTNAME=`hostname`
DATE="`date '+%Y-%m-%d %H:%M:%S'`"
NICNUM=`lspci|grep RTL8125|wc -l`
TEMP=`cat /sys/class/thermal/thermal_zone0/temp`
if [ ! -d /proc/scsi/usb-storage ];then
    if [ $NICNUM == "2" ];then
        echo "$DATE $HOSTNAME booted with $NICNUM RTL8125 and cpu temp is $TEMP">> /root/boot.log
        reboot
    else
        echo "$DATE $HOSTNAME booted with $NICNUM RTL8125 and cpu temp is $TEMP (error)" >> /root/boot.log
        reboot
    fi
fi
exit 0

放在rc.local里的自动测试脚本,如要中止测试插入一个U盘即可

@ophub
Copy link
Owner

ophub commented Feb 14, 2023

Snip20230214_8

33度,大风扇吹一个晚上试试。

@jysqice
Copy link
Author

jysqice commented Feb 14, 2023

测试需要反复重启,启动后网卡数量不会变,既不会减少也不会增加,缺失网卡情况下rescan仍然无效,不缺情况下再冷也是两个

@kuaner
Copy link

kuaner commented Feb 14, 2023

不确定与温度是否有关,我也有这个情况,openwrt固件,r68s,eth2不见了,通过ip link 看不到eth2

@kuaner
Copy link

kuaner commented Feb 14, 2023

截屏2023-02-14 21 10 23

截屏2023-02-14 21 10 58

@kuaner
Copy link

kuaner commented Feb 14, 2023

我是openwrt的固件,cpu的模式是schedutil,依然有这个问题,eth2不见了

@jysqice
Copy link
Author

jysqice commented Feb 14, 2023

ayufan-rock64/linux-mainline-kernel@b5ce971
一个可能是类似问题的解决方案

@kuaner
Copy link

kuaner commented Feb 14, 2023

截屏2023-02-14 21 25 31

回退到79正常,80,81都有这个情况

@ophub
Copy link
Owner

ophub commented Feb 15, 2023

Snip20230215_1

r66s放在风扇上吹了16个小时,一直保持在30多度,现在WAN口网络正常

@jysqice
Copy link
Author

jysqice commented Feb 15, 2023

抱歉,是我没说清楚,这个问题的重点在于“启动时的识别”,如果启动时已经识别出两个网卡,后面是不会掉的,反之亦然,所以测试不是开着一晚上,而应该是一晚上反复重启,我一共重启了好几千次才来提交问题的

@ophub
Copy link
Owner

ophub commented Feb 15, 2023

我的神,可别这么暴力测试,重启的r66s都晕圈了

@jysqice
Copy link
Author

jysqice commented Feb 15, 2023

ayufan-rock64/linux-mainline-kernel@b5ce971
能否尝试一下加入这个延时设置?

@ophub
Copy link
Owner

ophub commented Feb 15, 2023

https://github.com/unifreq/linux-6.1.y

你看看f大的内核源码里加了没,没有的话你自己添加测试下是否编译正常,使用正常。测试好提交pr给他。

@ihipop
Copy link

ihipop commented Feb 15, 2023

我的神,可别这么暴力测试,重启的r66s都晕圈了

@ophub

我是同样的问题,偶尔重启会没一块网卡(但是只要开机检测到了 怎么低温都不会掉网卡),用了 @jysqice 的重启脚本测试了一下 发现低温(其实温度也不是很低,就是风扇对着吹而已)情况下这个问题会频繁出现。

测试了一下电犀牛官方固件没有这个问题。

开机掉网卡后,有尝试过pcie reset 无效。
reset的时候dmesg会出现这种信息。
image
每次如果出问题,都是 eth1掉。
只能重启解决。很恼

但是只要上电能出现网卡,不管温度多低,使用中都还算很稳定,不会掉。

@kuaner
Copy link

kuaner commented Feb 15, 2023

我重启也无法解决,昨日换回79的内核表现一直就正常了

@ihipop
Copy link

ihipop commented Feb 15, 2023

我是openwrt的固件,cpu的模式是schedutil,依然有这个问题,eth2不见了

你是 R68S 我是 R66S

@kuaner
Copy link

kuaner commented Feb 15, 2023

是的,掉网卡的问题,都存在。但我也有个66s,目前没遇到这个问题,用的最新的内核

@ophub
Copy link
Owner

ophub commented Feb 15, 2023

f大可能了解79后内核更新了什么可能引起网卡丢失有关的补丁,问题我反馈给他了,等他看看什么原因。

@ihipop
Copy link

ihipop commented Feb 15, 2023

是的,掉网卡的问题,都存在。但我也有个66s,目前没遇到这个问题,用的最新的内核

机器热的时候,重启/上电启动基本上不会掉。
有问题的都是机器温度不高的时候。
温度不高也不是必掉。只是几率大很多。
所以用降温套装对着吹会增加出现的概率。

风扇+那个重启脚本能检测出来

@ihipop
Copy link

ihipop commented Feb 15, 2023

我对比了一下电犀牛的官方固件, 好像他们的dts里面还有处理低温的时候调整电压的设定 不知道是不是和这个有关系。

@ophub
Copy link
Owner

ophub commented Feb 15, 2023

发一下你找到的温控设定的代码链接

@ihipop
Copy link

ihipop commented Feb 15, 2023

发一下你找到的温控设定的代码链接

  • 只是官方DTB转译的DTS文件内的描述,
  • 官方固件的内核源码我不知道去哪里找,所以不知道官方内核针对这个DTS设定做了什么处理,也不知道是否和掉网卡相关
  • 官方固件怎么吹冷,重启/上电都不掉网卡(那个脚本测试了几百次)

rockchip4825-dts.tar.gz
image

@ophub
Copy link
Owner

ophub commented Feb 15, 2023

unifreq/linux-6.1.y@1c9dcaf

根据 @kuaner 的反馈,f大把在79(6.0.y)中添加的 rockchip-snps-pcie3 网卡补丁添加到了 6.1.y 源码里了,他重新打包了 6.1.11 的内核,我已经转存到了内核仓库 https://github.com/ophub/kernel/tree/main/pub/stable 请有问题的楼上的兄弟们测试下。

如果你的Armbian/OpenWrt系统已经是6.1.11内核,先更新为6.1.10,然后再更新6.1.11,因为不能同名更新。

Armbian 先同步下最新的脚本

armbian-sync

更新内核

armbian-update -k 6.1.10
自动重启后
armbian-update -k 6.1.11

OpenWrt 先更新下宝盒插件,这样会同步最新脚本

openwrt用户如果当前已经是6.1.11的,手动上传6.1.10内核到p4分区里,手动更新。重启后再更新回6.1.11

如果有耐心等待的也可以等6.1.12发布,可能今天f大会编译12内核。

@ophub ophub added the question Further information is requested label Feb 15, 2023
@jysqice
Copy link
Author

jysqice commented Feb 15, 2023

问题解决了,看样子是pcie3固件起的作用

@ophub
Copy link
Owner

ophub commented Feb 15, 2023

6.1.12也更新了,更新到这个内核继续测试。
这个内核给amlogic也带来了emmc的惊喜修复。做为LTS内核,希望6.1越来越稳定。
6.2也马上来了,到时候如果f大忙的忘了这个补丁,再提醒下,看来很对症。

@ophub ophub added documentation Improvements or additions to documentation support This need is supported and removed question Further information is requested labels Feb 15, 2023
@ihipop
Copy link

ihipop commented Feb 15, 2023

6.1.12也更新了,更新到这个内核继续测试。 这个内核给amlogic也带来了emmc的惊喜修复。做为LTS内核,希望6.1越来越稳定。 6.2也马上来了,到时候如果f大忙的忘了这个补丁,再提醒下,看来很对症。

amlogic的EMMC修复了什么问题?

@kuaner
Copy link

kuaner commented Feb 15, 2023

遇到了几次通过amlogic宝盒更新固件失败,需要线刷的情况,请问也是跟这个amlogic的EMMC修复有关么

@ophub
Copy link
Owner

ophub commented Feb 15, 2023

遇到了几次通过amlogic宝盒更新固件失败,需要线刷的情况,请问也是跟这个amlogic的EMMC修复有关么

在线下载更新的?还是手动上传更新的?固件如果通过web上传,就上传压缩包,如果解压了就通过scp上传,1G多浏览器上传会文件不完整。

无论是在armbian里还是在openwrt里,我每次做更新操作前,第一件事一定是先更新脚本。

armbian里使用armbian-sync可以把本机系统的脚本更新到最新
openwrt通过更新宝盒插件可以把全部脚本更新到最新

只要脚本是最新的,已知问题就都及时修复了,在过去的1年里,我做过几百次的更新,几乎4个系列的内核,每周发布新版我都会在armbian和openwrt分别更新一轮,我要检查这些文件是否发布到github仓库时是完整的,确保大家别下载到不完整的文件,我都分别一一验证过,包括固件我也不定期的安装,半个月至少会重新刷一遍固件,检查最近的固件有没问题,安装完openwrt都会接着做op固件更新。

@kuaner
Copy link

kuaner commented Feb 15, 2023

手动上传的压缩包,我看tg群也有哥们反馈这个情况

@kuaner
Copy link

kuaner commented Feb 15, 2023

也许也是因为刷了网卡掉了吧,我再多测试下

@ophub
Copy link
Owner

ophub commented Feb 15, 2023

openwrt失败大多数是挂载点的错乱了,/dev/loop2p2 挂载失败。解决办法是重启下再试。

p3/p2没有挂载上,一般是由于自己修改挂载点引起的,也有个别设备是分区有问题,可以手动修复下,比如你在p2里更新时,说/dev/loop2p2挂载失败,这时是要使用p3,肯定是p3没挂载,或者分区有问题,

简单解决就是先手动挂载: mount /dev/mmcblk2p3 /mnt/mmcblk2p3 ,如果挂载上了就继续更新,
如果还是执行到挂载就失败,就把分区格式一下:mkfs.btrfs /dev/mmcblk2p3 -f
然后重启再更新固件肯定会成功了。

以上手动修复挂载或者更新分区,自己确认下,你当前系统在p2里,就处理p3,如果你在p3里就处理p2,使用 lsblk 可以看你根目录 / 挂载到了哪里

当然如果你上传的固件是不完整的,那怎么也不会成功,所以在线下载更新里有sha256sum验证,你手动上传要自己确认下文件是否完整。

@kuaner
Copy link

kuaner commented Feb 15, 2023

r68s刷机确认,修复了丢2.5g网卡的问题

@ophub
Copy link
Owner

ophub commented Feb 15, 2023

好的,多谢反馈

@ihipop
Copy link

ihipop commented Feb 16, 2023

这个issue可以关啦

@ophub ophub closed this as completed Feb 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation support This need is supported
Projects
None yet
Development

No branches or pull requests

4 participants