Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node Stall Ready check does not perform search in stall queue as intended #28

Open
BLuedtke opened this issue Mar 13, 2023 · 2 comments · May be fixed by #38
Open

Node Stall Ready check does not perform search in stall queue as intended #28

BLuedtke opened this issue Mar 13, 2023 · 2 comments · May be fixed by #38
Assignees
Labels

Comments

@BLuedtke
Copy link
Collaborator

BLuedtke commented Mar 13, 2023

When sending a message via a node, first bidib determines if the node is ready to send.
This involves, among other things, checking whether the node is stalled, or, whether a supernode of the node is stalled. The stall check is performed in a function here.

If the node or a supernode is stalled, the linked function returns false. A node is stalled if the member "stall" is true and the address_stack (input param to linked fct) is NOT contained in the member stall_affected_nodes_queue.
Thus, the linked fct attempts to find the addr_stack in the queue stall_affected_nodes_queue by using g_queue_find, see line 114. As the addr_stack is an array of 4 uint8_t's, a custom comparison function is used, as the comparison shall be element-wise, not a pointer comparison. This comparator is called bidib_node_stall_queue_entry_equals (see line 99ff.).
The call to g_queue_find looks as follows: !g_queue_find(state->stall_affected_nodes_queue, bidib_node_stall_queue_entry_equals).
Note that the element to find is not actually given here! This code just always returns true, the comparator is never called. g_queue_find is supposed to be called with the queue and the element to find (see here). To use a custom comparator, we need to use g_queue_find_custom (see here). It shall be called like this: !g_queue_find_custom(state->stall_affected_nodes_queue, addr_stack, (GCompareFunc)bidib_node_stall_queue_entry_equals).

The unit tests related to this part (bidib_send_tests) pass as the "stall" member is apparently sufficient to indicate whether the node is stalled or not. However, there's probably a reason for the existence of the stall_affected_nodes_queue, so this comparison call should be fixed.

@BLuedtke BLuedtke added the bug label Mar 13, 2023
@BLuedtke BLuedtke self-assigned this Mar 13, 2023
@BLuedtke
Copy link
Collaborator Author

BLuedtke commented Mar 13, 2023

Admittedly, I'm a bit confused as to why the check for presence of addr_stack in stall_affected_nodes_queue is negated.
With what happens in the if-body, aka the addr_stack being added to said queue, a second call to bidib_node_stall_ready will return true, even though the node is still stalled - since the address is now present in the queue.

Only one thing prevents bidib from (incorrectly) sending a message over a stalled node: All invocations of bidib_node_stall_ready also check that the node's message queue is empty. However, in function bidib_node_try_send, a message is added to this queue when the node is stalled (i.e., on the first call to bidib_node_stall_ready where it'll return false as expected). In the next attempt to send a message (via bidib_node_try_send), the node is recognized as "not ready" purely because the message queue is not empty.

I can only guess the original intention. One possibility is that this should've been two checks: One check for the "stall" member of the node_state. Then, if the node is stalled, but the address of the stalled node or supernode is not yet present in the stall_affected_nodes_queue, add the address to the queue.

@eyip002
Copy link
Member

eyip002 commented Mar 14, 2023

Reference to MSG_STALL in the bidib protocol: http://bidib.org/protokoll/bidib_general_messages.html


This line looks suspicious, should probably be message[data_index]:

bidib_node_update_stall(addr_stack, message[message[0]]);


The idea of stall_affected_nodes_queue was probably to implement a way to check whether a super-node is has been stalled when trying to send a command to one of its sub-nodes.

When a MSG_STALL is received, bidib_node_update_stall(addr_stack, stall_status) is called. The addr_stack contains the address of the node that sent MSG_STALL and also the addresses of its super-nodes. Because addr_stack is a fixed array of size 4, the unused elements above the top of the stack are represented by 0x00. If a node determines that it needs to stall to avoid overwhelming its sub-nodes, then we should pause any sending of commands to the node and its sub-nodes.

Take the function discussed above in #28 (comment):

static bool bidib_node_stall_ready(const uint8_t *const addr_stack) {
uint8_t addr_cpy[4];
memcpy(addr_cpy, addr_stack, 4);
while (addr_cpy[0] != 0x00) {
t_bidib_node_state *state = g_hash_table_lookup(node_state_table, addr_cpy);
if (state != NULL && state->stall &&
!g_queue_find(state->stall_affected_nodes_queue,
bidib_node_stall_queue_entry_equals)) {
t_bidib_stall_queue_entry *stall_entry = malloc(
sizeof(t_bidib_stall_queue_entry));
memcpy(stall_entry->addr, addr_stack, 4);
g_queue_push_tail(state->stall_affected_nodes_queue, stall_entry);
return false;
}
for (int i = 2; i >= 0; i--) {
if (addr_cpy[i] != 0x00) {
addr_cpy[i] = 0x00;
break;
}
}
}
return true;
}

When a stalled node is found, the original addr_stack is pushed into the node's stall_affected_nodes_queue:

g_queue_push_tail(state->stall_affected_nodes_queue, stall_entry);

Otherwise, we reach the for-loop that zeros out the top of the addr_cpy . The copy of the address stack now addresses a super-node closer to the root node (addr_cpy[0] is the root node):

for (int i = 2; i >= 0; i--) {
if (addr_cpy[i] != 0x00) {
addr_cpy[i] = 0x00;
break;
}
}

The outer while-loop repeats until a stalled node is found or all nodes in addr_stack have been considered.

As a result, the stall_affected_nodes_queue of a node may contain an address to itself and/or addresses to its sub-nodes. When a node is no longer stalled, we should try and resume the sending of commands to its sub-nodes and itself:

elem = g_queue_pop_head(state->stall_affected_nodes_queue);
t_bidib_node_state *waiting_node_state = g_hash_table_lookup(
node_state_table, elem->addr);
if (waiting_node_state != NULL) {
bidib_node_try_queued_messages(waiting_node_state);
}

@BLuedtke BLuedtke linked a pull request May 6, 2024 that will close this issue
@BLuedtke BLuedtke linked a pull request May 6, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants