-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mbed cellular stack does not establish PPP session #7219
Comments
Adding @MarceloSalazar |
This looks like a problem fixed in PR #6962. Can you cherry-pick that to see if it helps? |
ARM Internal Ref: MBOTRIAGE-702 |
@AriParkkila I've checked out #6962 but unfortunately I still see the same behaviour. You probably spotted it in the log but one possible clue might be that on this SIM there are some unsolicited message-waiting indications (+UMWI) Could this be affecting the state machine? |
@NeilMacMullen that +UMWI should be handled gracefully with PR #6962. The problem is that after calling According to the log there are lots of debug prints after CONNECT. I wonder where those might be coming... Those do not look exactly like coming from Can you enable TRACE_LEVEL_DEBUG to see more traces, especially prints from |
@AriParkkila I disabled the message waiting and that doesn't make a difference. TRACE_LEVEL_DEBUG is already set in main using
were you expecting to see more debug?
Do you mean the 'line noise'? If so this is raw output from UARTSerial read/write which is tracing I added to confirm that bytes are being sent/received between mcu and modem. |
Ah - I hadn't realised you need to compile in the debug traces.. Here's the relevant portion of trace around the PPP channel connection...
|
I'm assuming this is a PPP negotation failure - I tried dissecting your log, but the PPP data appears to be somewhat garbled - some sort of Latin-1->UTF-8 conversion I can't reverse, and some other loss. It appeared to be the higher level giving up, not the PPP - "user interrupt" after a 30-second timeout. It's vaguely possible it was just being slow and extending the 30 second timeout would clear it, but I doubt it. I think you'll need to turn on enough lwIP debug to show the PPP negotiation. It looks like there's enough happening that we're having a conversation, but don't know what. I think you'll need to turn on |
@kjbracey-arm Thanks Kevin - that sounds plausible, I'm busy this morning but can dump out the raw PPP bytes later today. Yesterday I got as far as establishing that this is failing when lwip::bringup checks to see if iface_is_up (apologies if these aren't exact names - I'm away from the source) , causing ppp_close to be called with USERERR. That seems consistent with the idea that PPP negotiation just isn't getting anywhere. I'm open the idea this may just be a cellular network/sim configuration issue though as I say, the ublox on-chip stack happily connects and sends data using the same sim/modem Anway, more later.... |
I've attached a log of the raw uart trace (captured from uartserial write/read functions. |
@kjbracey-arm
Can you suggest sensible numbers for these? The default appears to be 1600 (words? bytes?). I've tried a random selection of values between 2000 and 8192 but seem to be alternating between stack overflow and heap exhaustion :( (I'm on an RF52 so available ram is ~55K) Here's what I currently have in my target_overrides section...
|
It may be that you just don't have enough RAM in the system to enable lwIP's PPP trace without messing with it - it uses quite a big line buffer on the stack. Maybe reduce buffer sizes in lwip_utils.c? |
Well, that trace is showing some corruption - we've sent 3 LCP ConfReqs, but got back 2 LCP ConfAcks with most of the middle missing. Trace cut out just after sending the third. And most of the data is the modem sending us LCP ConfReqs and us replying with LCP ConfAcks. Last incoming ConfReq was slightly corrupt, and presumably the modem isn't getting our ConfAcks intact.
Seems to be a general serial comms problem rather than anything specifically PPP. Does sound like #6962, as @AriParkkila suggested, but if you've already tried that, not sure what else to suggest. |
We are not locally testing the Sara U201 here - @ARMmbed/team-ublox, have you tested this driver with it? |
We have used the new drivers and the cellular example build on the |
Yes, I was wondering if the trace could be breaking it further. If the APN is invalid, wouldn't that fail more decisively with an authentication failure rather than 30 second timeout though? |
You usually get "connect" and then the connection drops ~10 seconds later. |
Even the first trace we have here has some PPP dumping (raw) out the console. So it's possible that we're seeing a corruption effect due to tracing rather than the original problem. That trace was saying "user interrupt" (ie due to high-level timeout), but maybe that's not the base issue. @NeilMacMullen - can you redo logs without the PPP dumping present at all, so there's no console output during PPP negotiation? And do that with #6962 too. Would like to see the |
The fact that this is on nRF52 may be the determining issue here. What do we think about the current state of the cellular stack and |
@kjbracey-arm Thanks Kevin. WRT to the uart tracing, I'm just dropping stuff into RAM then dumping it out after the ppp channel fails so cpu overhead should be minimal. It's possible some of the missing bytes might be just an artefact of the way I've reconstituted the data - I need to check that. The physical uart has no flow-control and is running at 115200, connected via about 15mm of tracking. I think bitwise corruption is unlikely (and we haven't seen it before using the c027interface) but it may be that there is a pacing issue on inbound bytes. The default UartSerial Rx buffer size is 256 bytes so i'd expect that to be large enough to do the initial negotiation without problems. I'll try some of your suggestions above. |
I've just seen your comments on RF52 uart.. Yes, there definitely have been some issues with RF52 serial - I'm not sure what the current state of play is there - @marcuschangarm may have a better idea. |
I've run
Note that you need to provide enough power to the C030 board for peak 2G/3G cellular transmit power, so either a solid 3 Amp USB power supply or a LiPo rechargeable battery plugged into the board. |
There are some built-in buffers in the UART driver you can configure as well: https://github.com/ARMmbed/mbed-os/tree/master/targets/TARGET_NORDIC/TARGET_NRF5x#serial I would recommend increasing the buffers if you suspect UART corruption. |
I would also expect the UARTSerial buffer size of 256 to be sufficient for negotiation. Maybe it is worth adjusting the NRF52's internal buffer setup as mentioned above. I'm looking at the NRF52 serial implementation - it's more complex than most. There's no transmit FIFO, and transmits will spend time busy-waiting, which will hurt performance. I think that's what's causing dropped input characters - we're busy-waiting for "TX end" in a transmit empty interrupt, causing us to drop RX bytes. The busy-wait will cause problems for UARTSerial (or any other interrupt-based output pump), because it will basically jam in the interrupt context doing
It's expecting serial_writable to become false at some point when the FIFO/data register fills, but if the putc blocks until serial is writable again, as nRF52 is doing, we'll just spend all day in the interrupt handler spinning until we're out of data. The HAL doesn't have non-blocking putc/getc directly, but If DMA->FIFO interrupts are higher priority than normal, then extending the nRF52 FIFO size would work around this, but it's would still leave a general interrupt latency problem. I also think you'd have to go all the way and have the FIFO size be IP/PPP packet size, which is less than ideal. Really there should be no need for big low-level buffers because we're supposed to be emptying them promptly in interrupt handlers. It's possible that "relaxing" the TX Irq handler to only attempt one byte per Tx IRQ might also help, but it would hurt throughput in most other systems. That would explain the input problems. Don't see how we can be messing up the output though. We're sending it with Now, IIRC the C027Interface used u-blox's own serial code, not |
Yes, it seems that UART drivers of NRF52840_DK are broken on the latest Mbed OS. I have NRF52840_DK evaluation board and it seems to lose RX characters in AT response to |
Yes, C027, having been around since early Mbed 2.0 days, had its own buffered serial handling, Pipe and SerialPipe. |
Thanks, Rob. That has the same basic loop in its The README for the nRF52 does state that you need to have the HAL FIFO big enough for the "biggest burst" incoming - ie IP packet-sized. Maybe this is a fundamental known platform issue. So, does it work with that set big enough? Conceptually, if interrupt latency is normal (ie not being hit by the TX spin), |
I'm going to see if I can get some physical probes on tx/rx to see what is really happening on the bus. From Kevin's description this should only be a problem where we have simultaneous bidirectional transfer. It's worth nothing that with the old C027 implementation this never occurred in practice because we were either uploading data or downloading so uart traffic was always effectively unidirectional. It sounds like the ppp negotiation is fully duplex? I can also try adding extra buffering and inter-byte delays on tx. |
That was my initial thought, but Ari's seen AT command loss without that (with 32-byte HAL FIFO). I'll have a look at the RX path to try to see if the RxIrq is being generated promptly enough. Not immediately obvious. But if it were the case, then yes, maybe it's not |
It seems that increasing DMA size fixes RX problem:
With that there are no characters dismissed but it's still not working properly, because |
@kjbracey-arm
I thought that was the expected behavior for
|
Isn't |
Ah. It only means wait for the peripheral to be available to write this character, if necessary. The expectation is that the byte could be going into a FIFO, and we block iff the FIFO is full on entry. The simplest platforms do
Interrupt-based transmitters (have to) assume that a pre-check for |
It actually is the TRNG API provider, using the CryptoCell-310. May be a separate issue - doesn't seem that the OP was getting stuck. |
@marcuschangarm - any thoughts on whether we should expect to see data loss without a large buffer when using IRQ-based reception? The nRF docs do discuss buffer sizing, but I'm not sure whether a large HAL buffer is being discussed to overcome a nRF-specific performance limitation, or as a kind of alternative to a buffer above the HAL layer. I can't see any obvious flaw in the RX path - if I understand correctly, the 8-byte DMA should complete, you get an interrupt, you copy into the FIFO, then start a 200us timer to send the RxIRQ. (Why not do it immediately?) If that's all working, should mean we get an RxIrq in plenty of time to get the data out of the FIFO before it can overflow, right? So increasing the size shouldn't help. But apparently that's not happening :( To be honest, I wouldn't have thought the FIFO is necessary at all - if you have 2 or more DMA buffers, just having serial_getc() read out of already-filled buffers, and re-queuing to hardware when drained by getc() would be adequate, I would have thought. Seems like the nRF52 HAL may be trying to solve a problem it isn't expected to - providing serial buffering. Quite a lot of our devices only have 1 byte of buffering... Two 8-byte DMA buffers would be loads compared to that. |
Ah, I misread Ari's comment. Thought he'd increased the FIFO size, but it was the DMA buffer size. That suggests there could be some interrupt latency problems. Is there a way SoftDevice could be impacting this? You've set UART priority to highest, which seems like it should punch through the TX spin, and that shouldn't be happening in Ari's case anyway. |
Oh, in that case these lines should be moved out of the way:
Unfortunately, the NRF52 hardware maps poorly to the Mbed HAL API. The UARTE requires buffers of at least 5 bytes and without double-buffering the DMA we lose data on switching. I've been testing firmware downloads over ESP8266 and the only way I can avoid data corruption is with a 2 KiB FIFO buffer. 😒
The TxIrq is being called directly from the UART handler and not from the lower priority timer (like RxIrq is). So that would explain why it is unable to punch through.
After rereading the code I'm not entirely sure. Since I'm already taking up one of the timers I think I didn't wan't to take up one of the SWIs as well so the delay is just an arbitrary number to the timer. Changing both RxIrq and TxIrq to use SWI would make it cleaner. |
Yes, I can see you're working against the grain here :( I'm convinced you need >1 DMA buffer to keep up, but not quite sure why you'd need a full-blown FIFO on top - particular one that is "burst-sized". 2 or 3 rotating DMA buffers of the order of 8-16 bytes should be fine.
It may be worth rechecking with the unwanted tx spin removed. The RxIrq handler may now have time to empty the FIFO promptly. |
That should work. I was pressed on time and and the atomic FIFO in the SDK made it easier than juggling a couple of DMA buffers.
I would be surprised. There isn't any Tx traffic once the download starts. |
In case you haven't seen it: #7069 |
Thanks @kjbracey-arm - I'll give these a try later today. |
These fixes definitely improve things. I've checked out master at commit 69d8c0b. In the example application they allow a connection to be made. I've modified the example very slightly to dynamically construct and then delete the OnboardCellularInterface instance and can loop the connection test. Unfortunately, there seems to be some interaction with the BLE code. If the BLE system is called via the Softdevice, then subsequent instances fail to receive any bytes from the Serial object and I see "AT timeout" errors. I've verified that nothing is being received as far down as UartSerial. So this code
succeeds on the first iteration but then fails on the second. Removing the call to BleGetId allows the connection to succeed on multiple iterations. The ble code is quite simple - here's a minimal example...
I also suspect the failure occurs if, instead of calling BLE, a new serialbase is instantiated and the freed using the same pins as were assigned to the OnboardCellularInterface but I'm still trying to verify that. |
I suspect this is relevant: #7415 (comment) |
ah yes - that definitely looks like it could be the same issue! |
Can you give this PR a go and see if if solves your problem, please? |
@marcuschangarm Yes - this definitely helps - I can now successfully connect and send/receive data! Unfortunately there seems to be something broken with SPI still but I think this issue can be closed. I'll post more details wrt SPI on your PR |
Great! Thank you for testing! |
Confirmed to fixed - closing |
mbed 5.9 (tag mbed-os-5.9.0), and also present on master 799ba08.
The mbed-os-example-cellular application connects to the network but fails to establish a PPP data channel.
The target modem is a UBLOX SARA U201 (3G) which successfully connects and transfers data using the old C027Interface code and on-modem stack. Screenshot and attached log show some PPP activity but no connection.
pppfail.log
The text was updated successfully, but these errors were encountered: