Node Stall Ready check does not perform search in stall queue as intended #28

BLuedtke · 2023-03-13T14:51:11Z

When sending a message via a node, first bidib determines if the node is ready to send.
This involves, among other things, checking whether the node is stalled, or, whether a supernode of the node is stalled. The stall check is performed in a function here.

If the node or a supernode is stalled, the linked function returns false. A node is stalled if the member "stall" is true and the address_stack (input param to linked fct) is NOT contained in the member stall_affected_nodes_queue.
Thus, the linked fct attempts to find the addr_stack in the queue stall_affected_nodes_queue by using g_queue_find, see line 114. As the addr_stack is an array of 4 uint8_t's, a custom comparison function is used, as the comparison shall be element-wise, not a pointer comparison. This comparator is called bidib_node_stall_queue_entry_equals (see line 99ff.).
The call to g_queue_find looks as follows: !g_queue_find(state->stall_affected_nodes_queue, bidib_node_stall_queue_entry_equals).
Note that the element to find is not actually given here! This code just always returns true, the comparator is never called. g_queue_find is supposed to be called with the queue and the element to find (see here). To use a custom comparator, we need to use g_queue_find_custom (see here). It shall be called like this: !g_queue_find_custom(state->stall_affected_nodes_queue, addr_stack, (GCompareFunc)bidib_node_stall_queue_entry_equals).

The unit tests related to this part (bidib_send_tests) pass as the "stall" member is apparently sufficient to indicate whether the node is stalled or not. However, there's probably a reason for the existence of the stall_affected_nodes_queue, so this comparison call should be fixed.

The text was updated successfully, but these errors were encountered:

BLuedtke · 2023-03-13T15:20:46Z

Admittedly, I'm a bit confused as to why the check for presence of addr_stack in stall_affected_nodes_queue is negated.
With what happens in the if-body, aka the addr_stack being added to said queue, a second call to bidib_node_stall_ready will return true, even though the node is still stalled - since the address is now present in the queue.

Only one thing prevents bidib from (incorrectly) sending a message over a stalled node: All invocations of bidib_node_stall_ready also check that the node's message queue is empty. However, in function bidib_node_try_send, a message is added to this queue when the node is stalled (i.e., on the first call to bidib_node_stall_ready where it'll return false as expected). In the next attempt to send a message (via bidib_node_try_send), the node is recognized as "not ready" purely because the message queue is not empty.

I can only guess the original intention. One possibility is that this should've been two checks: One check for the "stall" member of the node_state. Then, if the node is stalled, but the address of the stalled node or supernode is not yet present in the stall_affected_nodes_queue, add the address to the queue.

eyip002 · 2023-03-14T22:59:10Z

Reference to MSG_STALL in the bidib protocol: http://bidib.org/protokoll/bidib_general_messages.html

This line looks suspicious, should probably be message[data_index]:

libbidib/src/transmission/bidib_transmission_receive.c

Line 353 in 9f39427

bidib_node_update_stall(addr_stack, message[message[0]]);

The idea of stall_affected_nodes_queue was probably to implement a way to check whether a super-node is has been stalled when trying to send a command to one of its sub-nodes.

When a MSG_STALL is received, bidib_node_update_stall(addr_stack, stall_status) is called. The addr_stack contains the address of the node that sent MSG_STALL and also the addresses of its super-nodes. Because addr_stack is a fixed array of size 4, the unused elements above the top of the stack are represented by 0x00. If a node determines that it needs to stall to avoid overwhelming its sub-nodes, then we should pause any sending of commands to the node and its sub-nodes.

Take the function discussed above in #28 (comment):

libbidib/src/transmission/bidib_transmission_node_states.c

Lines 108 to 130 in 9f39427

 static bool bidib_node_stall_ready(const uint8_t *const addr_stack) { 

 uint8_t addr_cpy[4]; 

 memcpy(addr_cpy, addr_stack, 4); 

 while (addr_cpy[0] != 0x00) { 

 t_bidib_node_state *state = g_hash_table_lookup(node_state_table, addr_cpy); 

 if (state != NULL && state->stall && 

 !g_queue_find(state->stall_affected_nodes_queue, 

 bidib_node_stall_queue_entry_equals)) { 

 t_bidib_stall_queue_entry *stall_entry = malloc( 

 sizeof(t_bidib_stall_queue_entry)); 

 memcpy(stall_entry->addr, addr_stack, 4); 

 g_queue_push_tail(state->stall_affected_nodes_queue, stall_entry); 

 return false; 

 } 

 for (int i = 2; i >= 0; i--) { 

 if (addr_cpy[i] != 0x00) { 

 addr_cpy[i] = 0x00; 

 break; 

 } 

 } 

 } 

 return true; 

 }

When a stalled node is found, the original addr_stack is pushed into the node's stall_affected_nodes_queue:

libbidib/src/transmission/bidib_transmission_node_states.c

Line 119 in 9f39427

g_queue_push_tail(state->stall_affected_nodes_queue, stall_entry);

Otherwise, we reach the for-loop that zeros out the top of the addr_cpy . The copy of the address stack now addresses a super-node closer to the root node (addr_cpy[0] is the root node):

libbidib/src/transmission/bidib_transmission_node_states.c

Lines 122 to 127 in 9f39427

 for (int i = 2; i >= 0; i--) { 

 if (addr_cpy[i] != 0x00) { 

 addr_cpy[i] = 0x00; 

 break; 

 } 

 }

The outer while-loop repeats until a stalled node is found or all nodes in addr_stack have been considered.

As a result, the stall_affected_nodes_queue of a node may contain an address to itself and/or addresses to its sub-nodes. When a node is no longer stalled, we should try and resume the sending of commands to its sub-nodes and itself:

libbidib/src/transmission/bidib_transmission_node_states.c

Lines 232 to 237 in 9f39427

 elem = g_queue_pop_head(state->stall_affected_nodes_queue); 

 t_bidib_node_state *waiting_node_state = g_hash_table_lookup( 

 node_state_table, elem->addr); 

 if (waiting_node_state != NULL) { 

 bidib_node_try_queued_messages(waiting_node_state); 

 }

BLuedtke added the bug label Mar 13, 2023

BLuedtke self-assigned this Mar 13, 2023

BLuedtke linked a pull request May 6, 2024 that will close this issue

Merge fix for node stall ready search #38

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Node Stall Ready check does not perform search in stall queue as intended #28

Node Stall Ready check does not perform search in stall queue as intended #28

BLuedtke commented Mar 13, 2023 •

edited

Loading

BLuedtke commented Mar 13, 2023 •

edited

Loading

eyip002 commented Mar 14, 2023

Node Stall Ready check does not perform search in stall queue as intended #28

Node Stall Ready check does not perform search in stall queue as intended #28

Comments

BLuedtke commented Mar 13, 2023 • edited Loading

BLuedtke commented Mar 13, 2023 • edited Loading

eyip002 commented Mar 14, 2023

BLuedtke commented Mar 13, 2023 •

edited

Loading

BLuedtke commented Mar 13, 2023 •

edited

Loading