-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bees serial protocol #243
Comments
DSP callback: #244 |
Hmmmph - once I get more familiar with OSC for bfin simulator (see #245) I may start to regret rolling my own serial protocol rather than try and hook into an OSC-over-serial arduino lib or similar. What does the wire protocol for monome grids look like? Any thoughts on this? Performance vs ease-of-code vs fitness-for-purpose tradeoffs? |
monome stuff is serial via FTDI, converted to OSC with serialosc (our own i suspect direct-serial is a good path for aleph communication. apologies for the line silence on my end-- insane cramming before we head On Tue, Feb 2, 2016 at 3:09 PM, rick notifications@github.com wrote:
|
i don't love the idea of shoving OSC parsing into bees, seems kinda heavy and/or lots of work to strip it down to something appropriately tiny. easier just to make a host-side bridge program using liblo (or python, like serialosc.) liblo is very, very easy to use. |
i haven't looked at the serial protocol which has been crafted but one thought crossed my mind... if the protocol is simple enough it could be interesting to use some (or all) of the serial machinery over the ii bus (aka i2c). i do occasionally think about what having a teletype capable of sending values into a bees network might look like... |
since there seems to be a consensus that what we already have is probably the right thing, I will try to keep moving forward with that. First real-world test for the protocol will be to integrate the existing aleph communication into my prototype midilooper host application. Once a few melodies have been tapped out on quneo pads playing grains like a sampler time to address #244 & #251. Then it will be possible to replace op_param with the full serial functionality toggling serial messages on outs/ins/params/param-callbacks. |
ha - so just got a few neat little bits and bobs on the go using lisp code like randomising pitchshift arpeggiator & other such silliness. Nothing that can't be done from bees already but kind of exciting anyway. The bad news is the new serial coms does seem to overload bees very easily especially if you start zooming around the menu with encoders or buttons whilst sending as little as 20 messages per second. No idea why this might be at present. Going to try flashing non-debug bees maybe everything stops blowing up... EDIT: Just posted some total nonsense & deleted the post. Somewhat stuck on this & flailing around a bit sorry for the spam... |
Set up a debug variable to print number of serial events since last complete message, and the chars in that message were going through the event loop one at a time (though reading serial.c properly this doesn't seem to be the intention). Just tried adding a hack to serial.c to only trigger events after stop char received (just pushed that on my dev branch, though it may need to be reverted). It passes serial messages in one go but does not fix the problem. still unstable if you try to use encoders in menus whilst hitting aleph with even 10 serial commands a second. Blows up somewhere above 20 messages per second in play mode not touching the controls. This behaviour also highly reproducible. 20 messages per second will run for over 5 minutes, whereas 25 messages per second nearly instant death. So I guess all the interrupt logic, scheduling etc is sound or things would be less predictable. This behaviour seems to me consistent with running out of cycles. Maybe just inefficient serial protocol code that is to blame. test message is only 7 bytes - same behaviour whether triggering bees in or bfin param. I'm totally stumped - any suggestions dudes? EDIT: was confusing messages per sec with ms per message posting late last night (corrected). Also realised the obvious point after sleep that printing param/bees-in to screen in play mode is stress-testing the drawing code as well as the serial. |
can you link to the code where you're actually calling serial_send_ byte() or serial_send_long() or whatever? i can't quite find it right now. it's true that drawing is kind of heavy. drawing to a screen region (aka rect) doesn't trigger a screen refresh directly; pixels are set and a dirty flag is set for the region. in bees, dirty regions are updated on a 50ms timer. but, rendering with fonts is kind of heavy in itself; each of those calls to as for the serial update itself, looks like ( if we want to start making heavy realtime use of the UART, then maybe we should investigate asynchronous updates and an output FIFO... oy |
looking at datasheet, seems straightforward to set an interrupt for TXREADY on the USART... so, it could go something like this:
|
well, it looks like the intention is indeed to raise an event on each incoming byte: i agree that this doesn't seem maximally efficient... . i guess at the time it seemed like the best way to leave it up to the application to implement whatever serial protocol it wanted. and it doesn't assume that each event will necessarily be serviced before the next interrupt. (which is good...) is this what you mean, or something else? |
Apologies was sure I'd pushed that code last night (pushed now). This is the function that sends all serial bytes (copied from op_serial.c)
yes that's what I meant - was blaming event buffer overflow on serial rx generating too many events, also thinking the machinery of event loop itself could be burning up precious cycles. Interesting that 50ms is the magic number for updating dirty regions, whereas serial comms blow up instantly as soon as I reduce the sleep between messages below 50ms! Could be a total coincidence, but is it possible that setting dirty flag twice without redraw triggers a crashing bug? Will try to find where that 50ms is defined, try tweaking it, see if it affects things. The tests results I posted were a release build (make clean; make -k R=1). Release-build/debug-build didn't seem to make much difference how hard I could push the USART. Should repeat that experiment taking careful notes to make sure I didn't screw up somewhere... EDIT (wrong times/rates again & corrected - too much/not enough coffee/sleep probably) |
that is suggestive...
.. but i don't think it's that, exactly; we do many, many drawing calls between refreshes already. that's kinda the point. (for example, each separate string that we draw sets a dirty flag, and we have lots of strings per menu page.) here's where the screen timer is set, if you wanna try tweaking the period or disabling it entirely: |
(moved some additional comments about screen rendering to new issue #252) |
Another empirical observation made whilst hacking around last night. adding 100us sleep in this space: Is that to be expected? Was kind of surprised adding this delay should set off a chain reaction totally overloading the system... |
serial_store() is called from the UASRT rx interrupt, so yeah, its execution time is pretty critical. |
If rx interrupt execution time is super, super critical another quick test to try this evening will be to shift everything including the calls to usart_read_char into the event handler in apps/bees/ser.c . So kEventSerial becomes a simple flag "hey main loop, there is new serial data, go check it". Got that experiment to try, plus the test hitting usart hard in play mode without displaying that param/bees-in. Hopefully all this random thrashing will eventually result in enough understanding to get things moving forward again... |
hm, i don't quite follow...
... that's exactly what it is (or should be.) the rx interrupt handler should do nothing but:
this is the event handler, so, spending too much time in an event handler is not great, because more events will come in and eventually fill up the queue. and if an event handler uses a critical section (the somewhat-misnamed an interrupt handler effectively constitutes a critical section automatically. (sorry if all this is overly obvious or pedantic, just trying to be clear.) do you mean my proposed TXEMPTY interrupt handler above? but that's a different beast: |
nope haven't got my head round your tx proposal yet so can't comment on it, though I can remove all tx from this stress test. The change I'm proposing is move this stage:
Into the main loop when the event gets handled and if there are more bytes available, keep stuffing (like what's currently in serial.c). Also instead of stuffing them into serial RX ringbuffer, call my function recv_char directly: So the interrupt code becomes:
Can't explain why I have a weird hunch this might help - I'm probably totally wrong. Quick enough to try and see. |
ok, i see, sorry for the rant... let me think for a moment... |
... oh, of course. because we're not actually getting the data in the interrupt directly. yes that makes sense. we don't need the extra FIFO because the usart driver already has one. i forgot that we can keep reading bytes from the usart as long as they exist. in this regard its different from our USB drivers, upon which the serial driver was apparently patterned. in those, we have to be careful to store every byte we receive and process them in the correct order. |
OK so thanks very much for taking the time to explain all the issues surrounding interrupts. So the results of the first set of experiments (nothing drawing to the screen, no net running -just sitting in play mode) are:
Note: the serial message used for the test was a DSP param set command with random 16 bit data (7 bytes total) |
And the experiments combining UI events with serial stress:
Three tests:
|
experiment to test whether screen refresh rate set in init_app_timers correlates with maximum screen update rate we can trigger using UART. Selecting play screen, using same test serial message from earlier experiments (DSP param update random 16 bit value) & displaying the param on screen. Gradually increase the rate of serial messages until bees explodes.
Coincidence? I think not! |
hm, well it's unclear what the exact crash mechanism is, but clear enough that when redrawing the screen is eating up lots of time on the main loop, Something Bad happens... maybe an RX overrun, because screen redraw takes place in an especially big critical section. (maybe it doesn't need to; that would be nice...)
huh! that's unexpected... but maybe shouldn't have been, because now that i am actually properly looking at the driver and datasheet, i see that the usart peripheral doesn't have its own buffer, and though it can be explicitly connected to a DMA channel (PDCA in atmel-speak), the ASF driver isn't doing this. it is just checking the data byte of the Receive Holding Register (RHR). this register is written when a char arrives, raising RXRDY at which point we get an interrupt, since we've set it that way: https://github.com/tehn/aleph/blob/dev/avr32_lib/src/init.c#L143 if RHR is written with RXRDY already set, it gets an Overrrun Error (setting OVRE bit.) not clear (to me, yet) what other consequences that has. we can enable an interrupt for it if we want. (add AVR32_USART_IDR_OVRE_MASK to the IER register in the line linked above.) then in the IRQ we would want to check for overrun, and i guess clear the error bit by setting RSTSTA in the control register. hmm.. given all this, i'm not sure why there is a while() loop in for TX, we should definitely be writing to a fifo in the main loop and dealing with it from the TXREADY interrupt (not TXEMPTY like i said above.) |
hm, can't argue with that.. but for example, i can run a METRO at 10 ms triggering a render to the play page. so it's not simply a bug relating to multiple renders between screen refreshes. |
just performed a sanity check taking serial out of the equation. connecting encoder to metro period, metro tick to DSP param, displaying DSP param on screen. Reducing metro period to 5 ms is not an issue with screenTimer set to 100ms. So I believe it is the case that the crash is caused not by the draw code, not by the serial code, but by some conflict between the screen & serial code. Although I didn't yet fully absorb all the earlier information about interrupts, I noticed earlier that a fairly large chunk of the screen code here is wrapped in the app_pause() app_resume () - could that be relevant I wonder? https://github.com/rick-monster/aleph/blob/dev/apps/bees/src/render.c#L145 |
at this point i'm going to put my money on some consequence of RX overruns in the usart. because while screen refresh is happening, usart interrupts can't be serviced. gonna try taking the screen redraw out of the critical section, just for giggles... because i don't see anything particularly wrong with interrupting it. |
another thing to try might be promoting the usart irq priority from |
randomly tried ripping out the interrupt protection from that bit of render.c . Bees can now easily take 200 messages per second (with the 100ms redraw rate still in there). Totally stoked right now thanks for all the help & advice!!! just going to try the IRQ priority as another potential solution - maybe I did see a little screen glitch at one point though can't reproduce that now... EDIT: changing the IRQ priority doesn't give a drastic improvement. Probably too tired though and should repeat all experiments at some point over the long weekend to be safe.... |
i don't know if it seems like a good idea for usart interrupt to be permanently un-preemtable, just thought it was something to test. at this point i'm most curious about what actually happens on an RX overrun. if it's really bad, then yeah we should make sure those interrupts always make it through. but i, too, went ahead and removed the critical section from the screen update calls in bees |
Agreed then - remove redraw the critical section from screen update and ignore odd redraw glitch for now. I have no idea how to find out more about what's happening on RX overrun, though I did see a case when I was Denial-of-Service attacking bees with serial bursts and saw it mis-interpret a couple of the messages - some bytes must be getting dropped somewhere. Attempt to repeat this experiment did result in bringing down bees but that's running ~ 1k messages per second flat out - I'm satisfied serial is now robust enough. Aspects that certainly still need attention on this issue:
|
hm, yeah that's a quandary. most of the time, delimiting by CR makes sense, but not all the time... even within the same app. for example, if you want to use the serial port to load blackfin executables in bees... seems like we need to add a mode switch to turn off delimiting. and maybe make the delimiter value a variable, defaulting to CR. we should also keep in mind the limited size of the ringbuffer... it's 64 characters right now. could need to be bigger if there were something like a forth-interpreter app (which i definitely want! some day) considering rx overruns, i'm not seeing any reason they would cause a crash, should just produce dropped characters... will keep looking and try to isolate the crash condition. |
ok so revising the points to close out this feature:
|
W.r.t the crash I have one question: Does bees recover gracefully from event buffer overflow once the stimulus which triggered overflow is removed? I am going to try commenting this line: EDIT: kind of an interesting experiment but inconclusive and doesn't seem to solve the problem. I don't want to duplicate labour if you're already looking for the crash condition so moving back to those other points... |
other points are done now - patching over serial works!!!! - remaining niggles with serial comms:
|
Just increased some buffer sizes, did some more experiments. Gory details make for pretty tedious reading material but the upshot is I'm now sure event queue overflow is merely one of the symptoms of the crash. Also made a breakthrough in demonstrating that backed-up Tx is also not the problem. Totally disabling serial Tx makes absolutely no difference to the crash. The problem seems totally unique to flat-out serial param or input triggering, when those params/inputs are displayed in play mode. Since I cannot easily figure out how to hook up the callback on Tx-ready and there's some evidence a Tx buffer will not prevent crash on serial overload I think time for a pull-request. Quick dash to put together a cmd-line utility for hot-swapping modules over serial then I'll request the merge... |
Calling this 'done'. Might re-open to add a 'special' serial op for my host software... |
Over on my dev branch there is a bees serial protocol enabling the following operations from host.
remote control of network:
bees network discovery:
callback to host:
Planning to send pull request once all the untested items are known-working. Patching commands & presets will be dummy-only for now but think plumbing these in is straightforward. Host side is currently implemented as a common lisp serial test harness. Docs to follow once the protocol is proven in my host application (lisp midilooper).
callback to host functionality as it stands (op_serial) still kind of sucks... Eventually the protocol should be updated replacing bees serial op with these message types:
The text was updated successfully, but these errors were encountered: