common/swdptap: make SWD timing more consistent #1714

tlyu · 2024-01-04T21:24:25Z

Make SWD timing more consistent: output changes (including floating) take place after the falling edge, and reads take place before the rising edge.

This includes moving a gpio_get to occur after the clock-low delay, which would help reduce incorrect reads from lower-bandwidth target connections.

Eliminate redundant GPIO operations, typically ones that set the clock low, by making the bit string input/output primitives always end by outputting a falling clock edge.

Detailed description

This makes the bit banging code more consistent and easier to analyze. It also removes some redundant clearing of the clock, typically because most existing big banging functions ended on a falling clock edge, and also began with an explicit clearing of the clock. This is no longer necessary, if we adopt a convention that the "resting" phase of the clock is low.

It also moves a misplaced gpio_get from before a clock-low delay to after it, so it comes right before the rising clock edge. This can help avoid incorrect reads from lower-bandwidth target connections. It might be helpful for @stansotn or someone else with such target connections to test it out.

I'll mark it ready for review after testing on BMDA, though I suspect that BMDA might slow things down too enough to make the differences harder to see on the logic analyzer.

Note that adopting an inlined busy-loop function as in #1688 might be needed to gain more significant clock consistency.

Your checklist for this pull request

I've read the Code of Conduct
I've read the guidelines for contributing to this repository
It builds for hardware native (make PROBE_HOST=native)
It builds as BMDA (make PROBE_HOST=hosted)
I've tested it to the best of my ability
My commit messages provide a useful short description of what the commits do

Closing issues

N/A

src/platforms/common/swdptap.c

tlyu · 2024-01-08T21:02:28Z

I think the cycles saved by this change are small enough that my logic analyzer can't reliably see them, so I'm going to mark this as ready for review.

dragonmux

From a code perspective this all looks good to us. We'll try to get this tested on our local setup in the next couple of days to see what the before-and-after on the timing and waveforms looks like with a mind to merge this if it all looks fine.

We'll post our testing results as images and traces here to be able to reference back to if needed.

tlyu · 2024-01-13T15:05:38Z

I wonder if it's useful to hang some extra capacitance off of the probe wiring, to slow things down enough to A/B test the change to the bit read timing. (I don't have a MSO or similar at the moment, which would likely help a lot with this kind of test.)

dragonmux

We've just given this a test, screenshots of the resulting waveforms to follow, and while the delay routines on minimum delay now have a very slight duty cycle imbalance on SWCLK, the clarity improvements and other changes more than make up for that and it doesn't cause bad behaviour from parts tested. If all involved are happy, we're happy to merge this.

src/platforms/common/swdptap.c

dragonmux · 2024-01-16T10:38:45Z

Here are our results from testing this PR

first is a set of before captures from `main`:

no-delay SWD out:

no-delay SWD in:

minimum delay SWD out:

minimum delay SWD in:

Now follow the after captures taken without any extra nop's:

no-delay SWD out:

no-delay SWD in:

minimum delay SWD out:

minimum delay SWD in:

And finally, the no-delay routines with one extra nop:

no-delay SWD out:

no-delay SWD in:

In the case of the extra no-ops we may have inserted the extra in the wrong spot looking more closely at how it changed the waveforms - but this gives you a basis to work from at least.

tlyu · 2024-01-17T03:01:00Z

Thanks for the review!

I noticed similar changes as what you saw, but was also limited to a logic analyzer with 24MHz sample rate. Many of the asymmetries are on the order of one LA sample, and aren't that consistent. I can take a closer look to see if there are any asymmetries that are somewhat more consistent, and try to adjust the timing on those.

On BMP native clocked at 72MHz (default), it seems like a 1-cycle difference could easily result in an inconsistent 1-sample asymmetry at 24MSamp/sec. It would be nice to get some traces from a faster logic analyzer.

dragonmux · 2024-01-17T03:18:45Z

What we tend to do is assume that the asymmetry errs on the side that appears more commonly - so that final capture is showing an asymmetry ~2/3 of the way to the right between the two seen asymetries - they're both asymetrical on the side of the high time being low; just.. slightly closer to even than the captures showing one high period would otherwise suggest

Agreed, though, that getting some high sample density captures would be great - if we can get our scope set up for it'll we'll try to get some captures with that - but that'll be a few days out.

tlyu · 2024-01-17T03:20:48Z

Force-push is to resolve a merge conflict; I haven't dug into the asymmetries in detail yet.

tlyu · 2024-01-17T18:54:44Z

After poking at this a bit, staring at LA traces and disassembly, most of the calculations that occur inside the loop happen during the clock low phase for output. Input calculations can be moved around a bit. In any case, moving the clock clear to the top of the loop for output makes it easier to rebalance the duty cycle.

(gpio_set_val has enough branching that it takes a comparable number of cycles to the loop increment)

tlyu · 2024-01-18T00:19:50Z

Thanks for the review and detailed timing analysis! I've made some adjustments that seem to improve the duty cycle for me, at least as well as I can see at 24MSamp/sec.

dragonmux · 2024-03-09T01:13:59Z

Ah, sorry the ball got dropped a bit on this one - could you rebase this on main please and we'll look at getting this merged ASAP as we think it's probably ready to go otherwise.

dragonmux

Apologies this PR slipped away from us.. We've a few review notes from this round.

With the up-vs-down loops, we don't recall if we've already had a conversation about it and what the outcome was, so please forgive us if you've already explained to us before if there's a merit for the proposed change and we've just forgotten it. If there's a clearly demonstrated technical reason for using the count-down style here, please add comments in the code explaining the choice and otherwise ignore those 4 lints.

src/platforms/common/swdptap.c

dragonmux · 2024-07-05T10:36:10Z

We're working to tidy up and merge outstanding good quality PRs at the moment, and notice this one's not seen any movement since our review. If you're happy with us sorting the outstanding items and rebasing this on main, then we'll get cracking with that in the next week with the intention of landing this by mid month.

tlyu · 2024-07-07T20:50:28Z

Catching up… I'll try to look at it this week as well. I first have to see if I can even get the tree to build on macOS at the moment (I remember hearing about some Meson problems).

dragonmux · 2024-07-07T20:52:21Z

Sounds good! 🤞🏼 you can figure out the macOS build - there was a set of bug fixes pushed to main recently that solved some issues around BMDA on the fruity OS when doing a cross-build to build the firmware. Once rebased on main, that should be a reasonable process.

tlyu · 2024-07-08T22:56:27Z

Got Meson to work on macOS. Native doesn't fit into ROM with the default config on main.

Still puzzling through at least one merge conflict.

dragonmux · 2024-07-08T22:59:33Z

That sounds like some fairly reasonable progress. Which compiler version are you using for main to not be fitting (as far as we know, 12.2.Rel1 builds still fit)?

tlyu · 2024-07-08T23:10:26Z

arm-none-eabi-gcc (Arm GNU Toolchain 13.2.rel1 (Build arm-13.7)) 13.2.1 20231009

dragonmux · 2024-07-08T23:19:32Z

Ah, yes, that toolchain has worse codegen than 12.2.Rel1 so that makes sense of that - this is why we've held back on upgrading toolchains for a while now because the project's trying to squeeze every possible bit of space out at the moment..

Make SWD timing more consistent: output changes (including floating) take place after the falling edge, and reads take place before the rising edge. This includes moving a gpio_get to occur after the clock-low delay, which would help reduce incorrect reads from lower-bandwidth target connections. Eliminate redundant GPIO operations, typically ones that set the clock low, by making the bit string input/output primitives always end by outputting a falling clock edge.

Make clock cycle more symmetric, especially for `no_delay` variants. `swdptap_seq_out_no_delay` seems to need the clock clear to be moved back to the top of the loop, to ensure minimal delay between the falling edge and the SWDIO change. Also use a different loop counter that compiles to a `SUBS` `BCC`/`BCS` pair, instead of using an explicit `CMP`.

tlyu · 2024-07-12T15:14:34Z

Force push is to resolve merge conflicts. I'll check the other comments shortly.

dragonmux

All review items address, so LGTM - merging. Thank you for the contribution!

ALTracer reviewed Jan 4, 2024

View reviewed changes

src/platforms/common/swdptap.c Outdated Show resolved Hide resolved

tlyu marked this pull request as ready for review January 8, 2024 21:00

dragonmux previously approved these changes Jan 13, 2024

View reviewed changes

dragonmux added this to the v2.0 release milestone Jan 13, 2024

dragonmux added Enhancement General project improvement BMP Firmware Black Magic Probe Firmware (not PC hosted software) labels Jan 13, 2024

dragonmux reviewed Jan 16, 2024

View reviewed changes

src/platforms/common/swdptap.c Outdated Show resolved Hide resolved

tlyu dismissed dragonmux’s stale review via 6d00b0d January 17, 2024 03:19

tlyu force-pushed the fix/swd-timing branch from ea66e5e to 6d00b0d Compare January 17, 2024 03:19

dragonmux requested changes May 12, 2024

View reviewed changes

tlyu added 2 commits July 12, 2024 09:35

tlyu force-pushed the fix/swd-timing branch from 75aa822 to caf82d5 Compare July 12, 2024 15:13

tlyu added 2 commits July 12, 2024 10:55

common/swdptap: sign conversion, const fixes

7ea8176

common/swdptap: explain down-count

e742123

dragonmux approved these changes Jul 12, 2024

View reviewed changes

dragonmux merged commit 117028d into blackmagic-debug:main Jul 12, 2024
23 of 24 checks passed

dragonmux mentioned this pull request Oct 29, 2024

failed to scan S32k14 #1982

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

common/swdptap: make SWD timing more consistent #1714

common/swdptap: make SWD timing more consistent #1714

tlyu commented Jan 4, 2024 •

edited

Loading

tlyu commented Jan 8, 2024

dragonmux left a comment

tlyu commented Jan 13, 2024

dragonmux left a comment

dragonmux commented Jan 16, 2024 •

edited

Loading

tlyu commented Jan 17, 2024

dragonmux commented Jan 17, 2024

tlyu commented Jan 17, 2024

tlyu commented Jan 17, 2024

tlyu commented Jan 18, 2024

dragonmux commented Mar 9, 2024

dragonmux left a comment

dragonmux commented Jul 5, 2024

tlyu commented Jul 7, 2024

dragonmux commented Jul 7, 2024

tlyu commented Jul 8, 2024

dragonmux commented Jul 8, 2024

tlyu commented Jul 8, 2024

dragonmux commented Jul 8, 2024

tlyu commented Jul 12, 2024

dragonmux left a comment

common/swdptap: make SWD timing more consistent #1714

common/swdptap: make SWD timing more consistent #1714

Conversation

tlyu commented Jan 4, 2024 • edited Loading

Detailed description

Your checklist for this pull request

Closing issues

tlyu commented Jan 8, 2024

dragonmux left a comment

Choose a reason for hiding this comment

tlyu commented Jan 13, 2024

dragonmux left a comment

Choose a reason for hiding this comment

dragonmux commented Jan 16, 2024 • edited Loading

first is a set of before captures from main:

Now follow the after captures taken without any extra nop's:

And finally, the no-delay routines with one extra nop:

tlyu commented Jan 17, 2024

dragonmux commented Jan 17, 2024

tlyu commented Jan 17, 2024

tlyu commented Jan 17, 2024

tlyu commented Jan 18, 2024

dragonmux commented Mar 9, 2024

dragonmux left a comment

Choose a reason for hiding this comment

dragonmux commented Jul 5, 2024

tlyu commented Jul 7, 2024

dragonmux commented Jul 7, 2024

tlyu commented Jul 8, 2024

dragonmux commented Jul 8, 2024

tlyu commented Jul 8, 2024

dragonmux commented Jul 8, 2024

tlyu commented Jul 12, 2024

dragonmux left a comment

Choose a reason for hiding this comment

tlyu commented Jan 4, 2024 •

edited

Loading

dragonmux commented Jan 16, 2024 •

edited

Loading

first is a set of before captures from `main`: