Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

在android上mars::stn消息队列线程被阻塞的问题 #214

Closed
songzhangzhang opened this issue Apr 18, 2017 · 7 comments
Closed

在android上mars::stn消息队列线程被阻塞的问题 #214

songzhangzhang opened this issue Apr 18, 2017 · 7 comments

Comments

@songzhangzhang
Copy link

我们用Mars开发IM功能,在调用StnLogic.reset();之后, 新创建的NetCore实例在消息队列线程执行
xinfo2(TSF"net info:%_", GetDetailNetInfo()); 时, GetDetailNetInfo()导致消息队列线程被阻塞, 阻塞在 netlink_recv ifaddrs.c的104行的 recvmsg 调用。

调用栈的顶部部分如下:
netlink_recv ifaddrs.c:104
getNetlinkResponse ifaddrs.c:137
getResultList ifaddrs.c:213
getifaddrs ifaddrs.c:626
getifaddrs_ipv4_filter(std::vector<ifaddrinfo_ip_t, std::allocator<ifaddrinfo_ip_t> >&, unsigned int) getifaddrs.cc:141
GetDetailNetInfo() netinfo_util.cc:99
mars::stn::NetCore::NetCore()::$_0::operator()() const net_core.cc:156
...

在阻塞之前 getNetlinkResponse 已经读出了多次数据,最后一次读出的数据长度为20,但是都没有将p_done为1。

机型: Hisense F31, Android 6.0.1, Vision 3系统

@elviswoo
Copy link
Contributor

是mobile网络还是WiFi?是否处于锁屏状态?

@songzhangzhang
Copy link
Author

Wifi下, 没有锁屏

@songzhangzhang
Copy link
Author

阻塞之前, netlink_recv(p_socket, l_buffer, l_size) 最后收到的一组数据的长度是20, 这个 nlmsghdr
的 nlmsg_type 是 NLMSG_DONE, 但是 nlmsg_pid 与当前pid不一样,导致没有执行到 *p_done = 1;

@elviswoo
Copy link
Contributor

GetDetailNetInfo()函数阻塞在你的机型是必须吗?

@songzhangzhang
Copy link
Author

songzhangzhang commented Apr 18, 2017

在comm/jni/ifaddrs.c的getNetlinkResponse函数中下面的语句有bug:
if((pid_t)l_hdr->nlmsg_pid != l_pid || (int)l_hdr->nlmsg_seq != p_socket)
{
continue;
}

netlink的文档上说, nlmsg_pid与process id并没有1:1的对应关系。

nlmsg_seq and nlmsg_pid are used to track messages. nlmsg_pid shows the origin of the message. Note that there isn't a 1:1 relationship between nlmsg_pid and the PID of the process if the message originated from a netlink socket. See the ADDRESS FORMATS section for further information.
https://linux.die.net/man/7/netlink

nlmsg_pid 的值实际上是bind的时候kernel为socket自动分配的nl_pid。

nl_pid is the unicast address of netlink socket. It's always 0 if the destination is in the kernel. For a user-space process, nl_pid is usually the PID of the process owning the destination socket. However, nl_pid identifies a netlink socket, not a process. If a process owns several netlink sockets, then nl_pid can only be equal to the process ID for at most one socket. There are two ways to assign nl_pid to a netlink socket. If the application sets nl_pid before calling bind(2), then it is up to the application to make sure that nl_pid is unique. If the application sets it to 0, the kernel takes care of assigning it. The kernel assigns the process ID to the first netlink socket the process opens and assigns a unique nl_pid to every netlink socket that the process subsequently creates.
https://linux.die.net/man/7/netlink

所以,对于不同的netlink socket, nlmsg_pid是不同的,我们不能把它和process id比较。

@garryyan
Copy link
Collaborator

#216

@songzhangzhang
Copy link
Author

songzhangzhang commented Apr 30, 2017

@elviswoo GetDetailNetInfo()函数阻塞在特定条件下是必现的。 条件就是其它的so库中的代码也创建了一个netlink socket,并将该socket bind到默认地址,并且没有关闭该socket。这种条件下再调用 GetDetailNetInfo()就会阻塞。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants