You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using DAPLink on ATSAM3U2C HIC configured with DAPv2 in a CI setup with PyOCD as the host interface and have observed sticky failures of the DAPLink firmware that manifest as USB timeouts, e.g.:
Exception in thread CMSIS-DAP receive (0000000050203220304d5157303031313031303397969903):
Traceback (most recent call last):
File "/opt/homebrew/Cellar/python@3.9/3.9.16/Frameworks/Python.framework/Versions/3.9/lib/python3.9/threading.py", line 980, in _bootstrap_inner
Traceback (most recent call last):
File "/opt/homebrew/lib/python3.9/site-packages/pyocd/probe/cmsis_dap_probe.py", line 463, in read_ap_repeat_callback
values = result()
File "/opt/homebrew/lib/python3.9/site-packages/pyocd/probe/pydapaccess/dap_access_cmsis_dap.py", line 1011, in reg_read_repeat_cb
self.run()
File "/opt/homebrew/Cellar/python@3.9/3.9.16/Frameworks/Python.framework/Versions/3.9/lib/python3.9/threading.py", line 917, in run
self._target(*self._args, **self._kwargs)
File "/opt/homebrew/lib/python3.9/site-packages/pyocd/probe/pydapaccess/interface/pyusb_v2_backend.py", line 164, in rx_task
read_data = self.ep_in.read(self.packet_size, 10 * 1000)
File "/Users/kbalke/Library/Python/3.9/lib/python/site-packages/usb/core.py", line 423, in read
res = transfer.get_result()
File "/opt/homebrew/lib/python3.9/site-packages/pyocd/probe/pydapaccess/dap_access_cmsis_dap.py", line 154, in get_result
self.daplink.flush()
File "/opt/homebrew/lib/python3.9/site-packages/pyocd/utility/concurrency.py", line 29, in _locking
return self.device.read(self, size_or_buffer, timeout)
File "/Users/kbalke/Library/Python/3.9/lib/python/site-packages/usb/core.py", line 1029, in read
return func(self, *args, **kwargs)
File "/opt/homebrew/lib/python3.9/site-packages/pyocd/probe/pydapaccess/dap_access_cmsis_dap.py", line 790, in flush
self._read_packet()
File "/opt/homebrew/lib/python3.9/site-packages/pyocd/utility/concurrency.py", line 29, in _locking
return func(self, *args, **kwargs)
File "/opt/homebrew/lib/python3.9/site-packages/pyocd/probe/pydapaccess/dap_access_cmsis_dap.py", line 1054, in _read_packet
raw_data = self._interface.read()
File "/opt/homebrew/lib/python3.9/site-packages/pyocd/probe/pydapaccess/interface/pyusb_v2_backend.py", line 234, in read
ret = fn(
File "/Users/kbalke/Library/Python/3.9/lib/python/site-packages/usb/backend/libusb1.py", line 846, in bulk_read
raise DAPAccessIntf.DeviceError("Device %s read thread exited unexpectedly" % self.serial_number)
pyocd.probe.pydapaccess.dap_access_api.DAPAccessIntf.DeviceError: Device 0000000050203220304d5157303031313031303397969903 read thread exited unexpectedly
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/kbalke/Library/Python/3.9/lib/python/site-packages/nlk/implant/icdi/pyocd.py", line 397, in memory_read
return self.probe_session.target.read_memory_block8(addr, num_bytes)
File "/opt/homebrew/lib/python3.9/site-packages/pyocd/core/soc_target.py", line 212, in read_memory_block8
return self.__read(self.lib.libusb_bulk_transfer,
File "/Users/kbalke/Library/Python/3.9/lib/python/site-packages/usb/backend/libusb1.py", line 954, in __read
return self.selected_core_or_raise.read_memory_block8(addr, size)
File "/opt/homebrew/lib/python3.9/site-packages/pyocd/coresight/cortex_m.py", line 476, in read_memory_block8
_check(retval)
File "/Users/kbalke/Library/Python/3.9/lib/python/site-packages/usb/backend/libusb1.py", line 602, in _check
raise USBTimeoutError(_strerror(ret), ret, _libusb_errno[ret])
usb.core.USBTimeoutError: [Errno 60] Operation timed out
data = self.ap.read_memory_block8(addr, size)
File "/opt/homebrew/lib/python3.9/site-packages/pyocd/core/memory_interface.py", line 181, in read_memory_block8
data32 = self.read_memory_block32(addr, size // 4)
File "/opt/homebrew/lib/python3.9/site-packages/pyocd/utility/concurrency.py", line 29, in _locking
return func(self, *args, **kwargs)
File "/opt/homebrew/lib/python3.9/site-packages/pyocd/coresight/ap.py", line 1171, in _read_memory_block32
resp += self._read_block32_page(addr, n//4)
File "/opt/homebrew/lib/python3.9/site-packages/pyocd/coresight/ap.py", line 1129, in _read_block32_page
resp = self.dp.read_ap_multiple(self.address.address + self._reg_offset + MEM_AP_DRW, size)
File "/opt/homebrew/lib/python3.9/site-packages/pyocd/coresight/dap.py", line 934, in read_ap_multiple
return read_ap_multiple_cb()
File "/opt/homebrew/lib/python3.9/site-packages/pyocd/coresight/dap.py", line 924, in read_ap_multiple_cb
return result_cb()
File "/opt/homebrew/lib/python3.9/site-packages/pyocd/probe/cmsis_dap_probe.py", line 470, in read_ap_repeat_callback
raise self._convert_exception(exc) from exc
pyocd.core.exceptions.ProbeError: Device 0000000050203220304d5157303031313031303397969903 read thread exited unexpectedly
A failure with the above trace would be observed during a CI run when a probe is already open. Subsequent CI runs using the same DAP would observe a different trace, e.g.:
> pyocd rtt -t <target> -u <unique id>
0010339 I Target type is <target> [board]
0010339 C Error: [__main__]
Traceback (most recent call last):
File "/opt/homebrew/lib/python3.11/site-packages/pyocd/__main__.py", line 161, in run
status = cmd.invoke()
^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/pyocd/subcommands/rtt_cmd.py", line 116, in invoke
with session:
File "/opt/homebrew/lib/python3.11/site-packages/pyocd/core/session.py", line 391, in __enter__
self.open()
File "/opt/homebrew/lib/python3.11/site-packages/pyocd/core/session.py", line 525, in open
self._probe.open()
File "/opt/homebrew/lib/python3.11/site-packages/pyocd/probe/cmsis_dap_probe.py", line 260, in open
self._link.open()
File "/opt/homebrew/lib/python3.11/site-packages/pyocd/utility/concurrency.py", line 29, in _locking
return func(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/pyocd/probe/pydapaccess/dap_access_cmsis_dap.py", line 746, in open
self._interface.open()
File "/opt/homebrew/lib/python3.11/site-packages/pyocd/probe/pydapaccess/interface/pyusb_v2_backend.py", line 89, in open
assert self.closed is True
^^^^^^^^^^^^^^^^^^^
AssertionError
Doing some digging into the DAPLink firmware, it appears that this is happening because the IN endpoint is stuck, i.e., the firmware is not putting any data into the transmit FIFO and subsequent IN EP polls from the host time out. This is because the DAP_Cmd_queue has filled, and the current bulk implementation gates writing the IN EP data (DAP command response) on the queue being not full:
source/usb/bulk/usbd_bulk.c:
void USBD_BULK_EP_BULKOUT_Event(U32 event) Brian Esquilona • Mon May 6 15:02:44 2019 • Bulk endpoints for CMSIS v2 support
{
U16 bytes_rece;
uint8_t * rbuf;
bytes_rece = USBD_ReadEP(usbd_bulk_ep_bulkout, ptrDataIn, USBD_Bulk_BulkBufSize - DataInReceLen);
ptrDataIn += bytes_rece;
DataInReceLen += bytes_rece;
if ((DataInReceLen >= USBD_Bulk_BulkBufSize) ||
(bytes_rece < usbd_bulk_maxpacketsize[USBD_HighSpeed])) {
if (DAP_queue_execute_buf(&DAP_Cmd_queue, USBD_Bulk_BulkOutBuf, DataInReceLen, &rbuf)) { // This is the offending branch; USBD_BULK_EP_BULKIN_Event(0) is only invoked if `DAP_queue_execute_buf` returns true...
//Trigger the BULKIn for the reply
if (USB_ResponseIdle) {
USBD_BULK_EP_BULKIN_Event(0);
USB_ResponseIdle = 0;
}
}
//revert the input pointers
DataInReceLen = 0;
ptrDataIn = USBD_Bulk_BulkOutBuf;
}
}
source/daplink/cmsis-dap/DAP_queue.c:
BOOL DAP_queue_execute_buf(DAP_queue * queue, const uint8_t *reqbuf, int len, uint8_t ** retbuf)
{
uint32_t rsize;
if (queue->free_count > 0) { // <- if the queue is full, we always return `__FALSE`, and no progress is made.
if (len > DAP_PACKET_SIZE) {
len = DAP_PACKET_SIZE;
}
queue->free_count--;
memcpy(queue->USB_Request[queue->recv_idx], reqbuf, len);
rsize = DAP_ExecuteCommand(reqbuf, queue->USB_Request[queue->recv_idx]);
queue->resp_size[queue->recv_idx] = rsize & 0xFFFF; //get the response size
*retbuf = queue->USB_Request[queue->recv_idx];
queue->recv_idx = (queue->recv_idx + 1) % DAP_PACKET_COUNT;
queue->send_count++;
return (__TRUE);
}
return (__FALSE);
}
The same gating is performed in the HID implementation, as well.
I'm still not sure what the reproduction criteria are for causing this queue to fill; if PyOCD is always processing transactions one at a time, I would expect the queue never to have more than 1 element in it. Perhaps this has something to do with USB_ResponseIdle--I'm not sure what the semantics of this flag are, but it seems to be managed incorrectly in the bulk EP implementation, since it only ever gets written to 1 by the USBD_BULK_EP_BULKIN_Event implementation and gets immediately cleared to 0 in the calling context from USBD_BULK_EP_BULKOUT_Event. It seems like USBD_BULK_EP_BULKIN_Event can also be invoked from the main event loop when an interrupt is observed on the corresponding endpoint, but the only configured interrupt source for that is TX complete. As such, I'm guessing that we're observing two EP OUT events before the EP IN TX complete, resulting in a missed call to USBD_BULK_EP_BULKIN_Event when USB_ResponseIdle is clear in the second pass through USBD_BULK_EP_BULKOUT_Event. This would result in an off-by-one, where a response is queued in DAP_Cmd_queue that doesn't get egressed until the next command is received. After 3 such bubbles, the queue fills completely and we get stuck forever due to the conditional logic I highlighted above.
To address this bug, I propose unconditionally invoking USBD_BULK_EP_BULKIN_Event from USBD_BULK_EP_BULKOUT_Event. We'll still get chaining of packet egress since USBD_BULK_EP_BULKIN_Event gets invoked in the case of IN EP TX complete, so when the last write finishes, we'll enqueue another write to the FIFO. In the case that the queue is empty, we do nothing in either case.
The text was updated successfully, but these errors were encountered:
I am using DAPLink on ATSAM3U2C HIC configured with DAPv2 in a CI setup with PyOCD as the host interface and have observed sticky failures of the DAPLink firmware that manifest as USB timeouts, e.g.:
A failure with the above trace would be observed during a CI run when a probe is already open. Subsequent CI runs using the same DAP would observe a different trace, e.g.:
Doing some digging into the DAPLink firmware, it appears that this is happening because the IN endpoint is stuck, i.e., the firmware is not putting any data into the transmit FIFO and subsequent IN EP polls from the host time out. This is because the
DAP_Cmd_queue
has filled, and the current bulk implementation gates writing the IN EP data (DAP command response) on the queue being not full:source/usb/bulk/usbd_bulk.c:
source/daplink/cmsis-dap/DAP_queue.c:
The same gating is performed in the HID implementation, as well.
I'm still not sure what the reproduction criteria are for causing this queue to fill; if PyOCD is always processing transactions one at a time, I would expect the queue never to have more than 1 element in it. Perhaps this has something to do with
USB_ResponseIdle
--I'm not sure what the semantics of this flag are, but it seems to be managed incorrectly in the bulk EP implementation, since it only ever gets written to 1 by theUSBD_BULK_EP_BULKIN_Event
implementation and gets immediately cleared to 0 in the calling context fromUSBD_BULK_EP_BULKOUT_Event
. It seems likeUSBD_BULK_EP_BULKIN_Event
can also be invoked from the main event loop when an interrupt is observed on the corresponding endpoint, but the only configured interrupt source for that is TX complete. As such, I'm guessing that we're observing two EP OUT events before the EP IN TX complete, resulting in a missed call toUSBD_BULK_EP_BULKIN_Event
whenUSB_ResponseIdle
is clear in the second pass throughUSBD_BULK_EP_BULKOUT_Event
. This would result in an off-by-one, where a response is queued inDAP_Cmd_queue
that doesn't get egressed until the next command is received. After 3 such bubbles, the queue fills completely and we get stuck forever due to the conditional logic I highlighted above.To address this bug, I propose unconditionally invoking
USBD_BULK_EP_BULKIN_Event
fromUSBD_BULK_EP_BULKOUT_Event
. We'll still get chaining of packet egress sinceUSBD_BULK_EP_BULKIN_Event
gets invoked in the case of IN EP TX complete, so when the last write finishes, we'll enqueue another write to the FIFO. In the case that the queue is empty, we do nothing in either case.The text was updated successfully, but these errors were encountered: